使用flume+kafka+storm构建实时日志分析系统_PHP教程

使用flume+kafka+storm构建实时日志分析系统本文只会涉及flume和kafka的结合，kafka和storm的结合可以参考其他博客1. flume安装使用下载flume安装包http://www.apache.org/dyn/closer.cgi/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz 解压$ tar -xzvf apache-flume-1.5.2-bin.tar.gz -C /opt/flume flume配置文件放在conf文件目录下，执行文件放在bin文件目录下。 1）配置flume 进入conf目录将flume-conf.properties.template拷贝一份，并命名为自己需要的名字 $ cp flume-conf.properties.template flume.conf 修改flume.conf的内容，我们使用file sink来接收channel中的数据，channel采用memory channel，source采用exec source，配置文件如下：agent.sources = seqGenSrcagent.channels = memoryChannelagent.sinks = loggerSink# For each one of the sources, the type is definedagent.sources.seqGenSrc.type = execagent.sources.seqGenSrc.command = tail -F /data/mongodata/mongo.log#agent.sources.seqGenSrc.bind = 172.168.49.130# The channel can be defined as follows.agent.sources.seqGenSrc.channels = memoryChannel# Each sink's type must be definedagent.sinks.loggerSink.type = file_rollagent.sinks.loggerSink.sink.directory = /data/flume#Specify the channel the sink should useagent.sinks.loggerSink.channel = memoryChannel# Each channel's type is defined.agent.channels.memoryChannel.type = memory# Other config values specific to each type of channel(sink or source)# can be defined as well# In this case, it specifies the capacity of the memory channelagent.channels.memoryChannel.capacity = 1000agent.channels.memory4log.transactionCapacity = 100登录后复制 2）运行flume agent 切换到bin目录下，运行一下命令： $ ./flume-ng agent --conf ../conf -f ../conf/flume.conf --n agent -Dflume.root.logger=INFO,console 在/data/flume目录下可以看到生成的日志文件。2. 结合kafka 由于flume1.5.2没有kafka sink，所以需要自己开发kafka sink 可以参考flume 1.6里面的kafka sink，但是要注意使用的kafka版本，由于有些kafka api不兼容的这里只提供核心代码，process()内容。Sink.Status status = Status.READY; Channel ch = getChannel();Transaction transaction = null;Event event = null;String eventTopic = null;String eventKey = null;try {transaction = ch.getTransaction();transaction.begin();messageList.clear();if (type.equals("sync")) {event = ch.take(); if (event != null) { byte[] tempBody = event.getBody(); String eventBody = new String(tempBody,"UTF-8"); Map headers = event.getHeaders(); if ((eventTopic = headers.get(TOPIC_HDR)) == null) { eventTopic = topic; } eventKey = headers.get(KEY_HDR); if (logger.isDebugEnabled()) { logger.debug("{Event} " + eventTopic + " : " + eventKey + " : " + eventBody); } ProducerData data = new ProducerData (eventTopic, new Message(tempBody)); long startTime = System.nanoTime(); logger.debug(eventTopic+"++++"+eventBody); producer.send(data); long endTime = System.nanoTime(); }} else {long processedEvents = 0;for (; processedEvents event = ch.take(); if (event == null) { break; } byte[] tempBody = event.getBody(); String eventBody = new String(tempBody,"UTF-8"); Map headers = event.getHeaders(); if ((eventTopic = headers.get(TOPIC_HDR)) == null) { eventTopic = topic; } eventKey = headers.get(KEY_HDR); if (logger.isDebugEnabled()) { logger.debug("{Event} " + eventTopic + " : " + eventKey + " : " + eventBody); logger.debug("event #{}", processedEvents); } // create a message and add to buffer ProducerData data = new ProducerData (eventTopic, eventBody); messageList.add(data);}// publish batch and commit. if (processedEvents > 0) { long startTime = System.nanoTime(); long endTime = System.nanoTime(); }}transaction.commit();} catch (Exception ex) {String errorMsg = "Failed to publish events";logger.error("Failed to publish events", ex);status = Status.BACKOFF;if (transaction != null) {try {transaction.rollback(); } catch (Exception e) {logger.error("Transaction rollback failed", e);throw Throwables.propagate(e);}}throw new EventDeliveryException(errorMsg, ex);} finally {if (transaction != null) {transaction.close();}}return status; 登录后复制下一步，修改flume配置文件，将其中sink部分的配置改成kafka sink，如：producer.sinks.r.type = org.apache.flume.sink.kafka.KafkaSink producer.sinks.r.brokerList = bigdata-node00:9092producer.sinks.r.requiredAcks = 1producer.sinks.r.batchSize = 100#producer.sinks.r.kafka.producer.type=async#producer.sinks.r.kafka.customer.encoding=UTF-8producer.sinks.r.topic = testFlume1登录后复制 type指向kafkasink所在的完整路径下面的参数都是kafka的一系列参数，最重要的是brokerList和topic参数现在重新启动flume，就可以在kafka的对应topic下查看到对应的日志http://www.bkjia.com/PHPjc/1109725.htmlwww.bkjia.comtruehttp://www.bkjia.com/PHPjc/1109725.htmlTechArticle使用flume+kafka+storm构建实时日志分析系统本文只会涉及flume和kafka的结合，kafka和storm的结合可以参考其他博客 1. flume安装使用下载flume安装...