Apache Flume Performance Monitoring

In a followup to our last blog post that discussed monitoring Apache Zookeeper with Collectd, this time around we’ll discuss monitoring Apache Flume.

In keeping with our data-oriented infrastructure here at Nextdoor, its not surprising that we’re using Flume as a data pipeline for passing large amounts of logging data into central repositories for analysis.

When you’re generating tens of thousands of data points every second (or more!), it becomes extremely important that your data pipeline is running at peak efficiency all the time. In order to know when to grow your pipeline, and where to grow it, you need data. Thankfully, Flume provides that data in an easy to access JSON format.

Flume Monitoring Framework

The Flume framework provides for a simple to access locally accessible web URL that dumps out monitoring data in JSON. This functionality must be turned on though — documentation for that can be found here.

Monitoring data spit out for all of the Sinks, Channels and Sources that supply data. Here’s some example data from a FileChannel.

Each Source, Sink or Channel will provide metrics like this:

        {
            “CHANNEL.fc1”: {
                “ChannelCapacity”: “1000000”,
                “ChannelFillPercentage”: “0.0”,
                “ChannelSize”: “0”,
                “EventPutAttemptCount”: “0”,
                “EventPutSuccessCount”: “0”,
                “EventTakeAttemptCount”: “3203”,
                “EventTakeSuccessCount”: “0”,
                “StartTime”: “1367940231789”,
                “StopTime”: “0”,
                “Type”: “CHANNEL”
            }
        }

Collectd Monitoring Plugin

Due to some unfortunate JSON-parsing strangeness in the Collectd cURL-JSON plugin, we are unable to use it as a simple way to gather these metrics and parse them. It turns out that the Collectd cURL-JSON plugin looks only for numerical values, and cannot parse strings into numbers (i.e. “0” is not 0, so it ignores the data). For that reason, we’ve written our own plugin in Python that can pull these stats and dump them out in standard Collectd Text Protocol. The stats output looks like this:

        PUTVAL “localhost/flume-CHANNEL-fc1/gauge-ChannelSize” 1367956296:0.0
        PUTVAL “localhost/flume-CHANNEL-fc1/counter-EventPutAttemptCount” interval=60 1367956296:0
        PUTVAL “localhost/flume-CHANNEL-fc1/gauge-ChannelFillPercentage” 1367956296:0.0
        PUTVAL “localhost/flume-CHANNEL-fc1/counter-EventTakeSuccessCount” interval=60 1367956296:0
        PUTVAL “localhost/flume-CHANNEL-fc1/counter-EventTakeAttemptCount” interval=60 1367956296:3203
        PUTVAL “localhost/flume-CHANNEL-fc1/gauge-ChannelCapacity” 1367956296:1000000.0
        PUTVAL “localhost/flume-CHANNEL-fc1/counter-EventPutSuccessCount” interval=60 1367956296:0

Example Graphs

image

image

image

image

Code

The code for the plugin and detailed installation instructions are available in our GitHub repository.

Matt Wise
Sr. Systems Architect