Flume本身提供了http, ganglia的监控服务,而我们目前主要使用zabbix做监控。因此,我们为Flume添加了zabbix监控模块,和sa的监控服务无缝融合。
另一方面,净化Flume的metrics。只将我们需要的metrics发送给zabbix,避免 zabbix server造成压力。目前我们最为关心的是Flume能否及时把应用端发送过来的日志写到Hdfs上, 对应关注的metrics为:
- Source : 接收的event数和处理的event数
- Channel : Channel中拥堵的event数
- Sink : 已经处理的event数
zabbix安装
http://my.oschina.net/yunnet/blog/173161
zabbix监控Flume
#JVM性能监控
Young GC countssudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java)|tail -1|awk '{print $6}'Full GC countssudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java)|tail -1|awk '{print $8}'JVM total memory usagesudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java)|grep Total|awk '{print $3}'JVM total instances usagesudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java)|grep Total|awk '{print $2}'#flume应用参数监控启动时加上JSON repoting参数,这样就可以通过http://localhost:34545/metrics访问bin/flume-ng agent -n consumer -c conf -f bin/conf.properties -Dflume.monitoring.type=http -Dflume.monitoring.port=34545 &#生成一些数据for i in {1..100};do echo "exec test$i" >> /usr/logs/log.10;echo $i;done#通过shell脚本对JSON输出进行排版curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'SOURCE.kafka:OpenConnectionCount:0AppendBatchAcceptedCount:0AppendBatchReceivedCount:0Type:SOURCEEventAcceptedCount:7252225AppendReceivedCount:0StopTime:0EventReceivedCount:0StartTime:1407731371546AppendAcceptedCount:0SINK.es:BatchCompleteCount:10697ConnectionFailedCount:0EventDrainAttemptCount:7253061ConnectionCreatedCount:1BatchEmptyCount:226Type:SINKConnectionClosedCount:0EventDrainSuccessCount:7253061StopTime:0StartTime:1407731371546BatchUnderflowCount:14857SINK.hdp:BatchCompleteCount:1290ConnectionFailedCount:0EventDrainAttemptCount:8057502ConnectionCreatedCount:35787BatchEmptyCount:54894Type:SINKConnectionClosedCount:35609EventDrainSuccessCount:8057502StopTime:0StartTime:1407731371545BatchUnderflowCount:45433 --------------$1 变量!!!eg:EventDrainSuccessCount(source,channel,sink)#配置监控flume的脚本文件cat /opt/monitor_flume.shcurl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep $1|awk -F: '{print $2}'curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep Total|awk -F: '{print $2}'curl http://localhost:34545/metrics 2>/dev/null|sed -e 's/\([,]\)\s*/\1\n/g' -e 's/[{}]/\n/g' -e 's/[",]//g'|grep StartTime|awk -F: '{print $2}'#在zabbix agent配置文件进行部署cat zabbix_flume_jdk.confUserParameter=ygc.counts,sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $6}'UserParameter=fgc.counts,sudo /usr/local/jdk1.7.0_21/bin/jstat -gcutil $(pgrep java|head -1)|tail -1|awk '{print $8}'UserParameter=jvm.memory.usage,sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java|head -1)|grep Total|awk '{print $3}'UserParameter=jvm.instances.usage,sudo /usr/local/jdk1.7.0_21/bin/jmap -histo $(pgrep java|head -1)|grep Total|awk '{print $2}'UserParameter=flume.monitor[*],/bin/bash /opt/monitor_flume.sh $1