上一节介绍了Flume如何将数据收集到hdfs文件系统上。本节将分享多个agent连接配合使用。
原理图:
操作步骤:
1、将centos-aaron-h1的flume复制一份到centos-aaron-h2
sudo scp -r /home/hadoop/apps/apache-flume-1.6.0-bin [email protected]:/home/hadoop/apps/
2、进入centos-aaron-h1的Flume配置目录
cd ~/apps/apache-flume-1.6.0-bin/conf
3、新建配置文件
vi tail-avro-avro-logger.conf
4、在上面的配置文件中添加一下内容
#从tail命令获取数据发送到avro端口
#另一个节点可配置一个avro源来中继数据,发送外部存储
##################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/log/test.log
a1.sources.r1.channels = c1
# Describe the sink
#绑定的不是本机, 是另外一台机器的服务地址, sink端的avro是一个发送端, avro的客户端, 往hadoop01这个机器上发
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = master
a1.sinks.k1.port = 4141
a1.sinks.k1.batch-size = 2
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
5、保存上面配置
shift+Z+Z
6、创建Flume监听的文件所在的文件夹
mkdir /home/hadoop/log
7、创Flume监听的文件,并写循环写入数据
while true
do
echo 111111 >> /home/hadoop/log/test.log
sleep 0.5
done
8、新打开个ssh客户端执行下列命令查看日志文件变化【使用大写的-F是追踪文件名进行输出,而小写-f是inode进行追踪】
tail -F test.log
9、进入centos-aaron-h2的Flume配置目录
cd ~/apps/apache-flume-1.6.0-bin/conf
10、新建配置文件
vi avro-hdfs.conf
11、在上面的配置文件中添加一下内容
#从avro端口接收数据,下沉到logger
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
#source中的avro组件是接收者服务, 绑定本机
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
12、在centos-aaron-h2启动flume avro服务
bin/flume-ng agent -c conf -f conf/avro-hdfs.conf -n a1 -Dflume.root.logger=INFO,console
13.在centos-aaron-h1启动flume avro客户端
bin/flume-ng agent -c conf -f conf/tail-avro-avro-logger.conf -n a1
效果图:
注意点: Flume如果失败了,必须要重启agent进程,它会自动记录上次采集的位置,继续采集。大家可以通过写一个监听脚本来实现重启。最后大家可以参考下《基于Flume的美团日志收集系统》。