参考:
http://www.cnblogs.com/Richardzhu/p/3481996.html
http://blog.csdn.net/pirate_g/article/details/8463395
环境:
服务器名称角色IP
Client监控节点10.0.2.59
nn01被监控10.0.2.220
jt01被监控10.0.2.219
dn01被监控10.0.2.216
hadoop 集群(nn01,jt01,dn01),监控节点:Client
Hadoop1.2.1 版本和Hadoop2.5.2版本
监控节点部署:
下载软件包:
下载epel 源 :
http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
ganglia 3.6 :
http://ftp.jaist.ac.jp/pub/sourceforge/g/ga/ganglia/ganglia monitoring core/3.6.1/ganglia-3.6.1.tar.gz
ganglia-web 3.6:
http://ftp.jaist.ac.jp/pub/sourceforge/g/ga/ganglia/ganglia-web/3.6.2/ganglia-web-3.6.2.tar.gz
依赖包安装:
[root@Client src]# yum -y install httpd-devel automake autoconf libtool ncurses-devel libxslt groff pcre pcre-devel pkgconfig rrdtool* apr-devel apr-util check-devel cairo-devel pango-devel libxml2-devel rpm-build glib2-devel dbus-devel freetype-devel fontconfig-devel gcc-c++ expat-devel python-devel libXrender-devel libconfuse×
源码安装:
安装libconfuse
wget http://savannah.nongnu.org/download/confuse/confuse-2.7.tar.gz
tar -zxvf confuse-2.7.tar.gz
./configure CFLAGS=-fPIC --disable-nls
make && make install
安装ganglia
tar zxf ganglia-3.6.1.tar.gz
./configure --prefix=/usr/local/ganglia --with-gmetad --with-librrd --sysconfdir=/etc/ganglia
make
make install
添加gmond和gmetad为系统服务:
cp gmond/gmond.init /etc/rc.d/init.d/gmond
cp gmetad/gmetad.init /etc/rc.d/init.d/gmetad
chkconfig --add gmond && chkconfig gmond on
chkconfig --add gmetad && chkconfig gmetad on
配置gmetad
Ganglia web前端设置:
mkdir -p /var/lib/ganglia/rrds
tar zxf ganglia-web-3.6.2.tar.gz
cd ganglia-web-3.6.2
make install
添加服务命令,修改权限:
ln -s /usr/local/ganglia/bin/* /usr/bin/
ln -s /usr/local/ganglia/sbin/* /usr/sbin/
chown -R apache:apache /var/lib/ganglia
Ganglia的简单配置
生成gmond默认配置文件:54
gmond -t |tee /etc/ganglia/gmond.conf
修改ganglia配置文件
vim gmetad.conf
#"Hadoop cluster" 为
data_source "Hadoop cluster" 10.0.2.59
vim gmond.conf
cluster {
#与gmetad.conf 中的命名一致
name = "Hadoop cluster"
}
ganglia
启动ganglia,并访问其web页面:
/etc/init.d/gmond restart
/etc/init.d/gmetad restart
/etc/init.d/httpd restart
将监控节点的 /etc/ganglia/ /usr/local/ganglia/ /etc/init.d/gmond 拷贝到被监控节点的相应位置
scp -r /etc/ganglia/ nn01:/etc/
被监控节点配置hadoop信息:
vim /usr/local/hadoop/conf/hadoop-metrics2.properties
# for Ganglia 3.1 support
#我用的ganglia3.1版本
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
# default for supportsparse is false
*.sink.ganglia.supportsparse=true
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers=239.2.11.71:8649
datanode.sink.ganglia.servers=239.2.11.71:8649
jobtracker.sink.ganglia.servers=239.2.11.71:8649
tasktracker.sink.ganglia.servers=239.2.11.71:8649
maptask.sink.ganglia.servers=239.2.11.71:8649
reducetask.sink.ganglia.servers=239.2.11.71:8649
注:239.2.11.71这个是ganglia用的多播的地址,不需要改成gmetad的服务器地址。
如果需要监控hbase的话,也一样找到hbase目录下的这个文件,改法一样就不重复了。 改完以后将配置文件分发到各个datanode节点的${HADOOP_HOME}/conf目录下,重启Hadoop集群即可。
监控多Clusters 的方法:
修改文件:
[root@Client ganglia]# vim gmetad.conf
data_source "Hadoop cluster01" 10.0.2.59
data_source "Hadoop cluster02" 10.0.2.54
Hadoop cluster01 节点服务器配置文件:
vim gmond.conf
cluster {
name = "Hadoop cluster01"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
udp_send_channel {
#禁用组播,将数据发送给指定主机
#作用:是被添加的主机显示在预先定义好的组中。不定义会也许每个组都会用。
host = 10.0.2.59
port = 8649
ttl = 1
}
udp_recv_channel {
port = 8649
}
Hadoop cluster02 节点服务器配置文件:
vim gmond.conf
cluster {
name = "Hadoop cluster02"
owner = "unspecified"
latlong = "unspecified"
url = "unspecified"
}
udp_send_channel {
host = 10.0.2.54
port = 8649
ttl = 1
}
udp_recv_channel {
port = 8649
}
Hadoop 2.5.2 版本:
需要修改 hadoop-metrics.properties和hadoop-metrics2.properties配置文件
[root@hnn01 hadoop]# grep -v "^#" hadoop-metrics.properties | grep -v "^$"
dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
dfs.period=10
dfs.servers=hnn01:8649
mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
mapred.period=10
mapred.servers=hnn01:8649
jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
jvm.period=10
jvm.servers=hnn01:8649
rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
rpc.period=10
rpc.servers=hnn01:8649
ugi.class=org.apache.hadoop.metrics.ganglia.GangliaContext31
ugi.period=10
ugi.servers=hnn01:8649
[root@hnn01 hadoop]# grep -v "^#" hadoop-metrics2.properties | grep -v "^$"
*.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
*.period=10
*.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31
*.sink.ganglia.period=10
*.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both
*.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40
namenode.sink.ganglia.servers=hnn01:8649
resourcemanager.sink.ganglia.serves=hnn01:8649
datanode.sink.ganglia.servers=hnn01:8649
nodemanager.sink.ganglia.servers=hnn01:8649
maptask.sink.ganglia.servers=hnn01:8649
reducetask.sink.ganglia.servers=hnn01:8649
重启hadoop集群 配置生效 !