问题描述
寻找Elasticsearch和Apache Storm之间的简单集成路径.对它的支持包含在elasticsearch-hadoop库中,但这在Hadoop堆栈上带来了大量的依赖关系:从Hive到Cascading,我根本不需要.没有任何人在没有引入elasticsearch-hadoop的情况下成功实现了这种整合吗?谢谢.
Looking for a simple integration path between Elasticsearch and Apache Storm. Support for this is included in the elasticsearch-hadoop library, but this brings tons of dependencies on the Hadoop stack: from Hive to Cascading, that I simply don't need. Has anyone out there succeeded in this integration without bringing in elasticsearch-hadoop? Thanks.
推荐答案
在我的项目中,我们使用的是 rabbitmq河,用于为风暴输出编制索引.这是写Elasticsearch的非常有效和便捷的方法.基本上,您将消息放入队列中,其余的工作由河流来完成.如果有什么卡住了,数据就会被简单地缓存在队列中.
In my project we're using rabbitmq river for indexing the storm output. It's very efficient and convenient way to write to elasticsearch. You basically put the messages to the queue and the river does the rest. If something gets stucked the data are simply buffered on the queue.
所以我想说,使用这种编写方法,使用Elasticsearch Java API进行读取,就像Kit Menke建议的那样(或最好的客户端,我们发现它很酷,它提供了基于ApacheHttpAsyncClient的异步API,尽管我们不是从Elasticsearch中读取风暴拓扑中的内容,而是从不同的服务中读取的.)
So I would say, use this river approach for writing and elasticsearch Java API for reading, like Kit Menke suggests (or the Jest client, we've found this cool and it offers async API basing on ApacheHttpAsyncClient, though we're not reading from elasticsearch in storm topology but in different services).
这篇关于Elasticsearch/Storm集成方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!