问题描述
我是德鲁伊的新手.我已经读过《德鲁伊VS Elasticsearch》,但是我仍然不知道德鲁伊擅长什么.
I'm new to druid. I've already read "druid VS Elasticsearch", but I still don't know what druid is good at.
以下是我的问题:
-
我有一个具有70个节点的solr集群.
I have a solr cluster with 70 nodes.
我在solr中有一个很大的表,它有10亿行,每行有100个字段.
I have a very big table in solr which has 1 billion rows, and each row has 100 fields.
用户将使用字段的不同组合范围查询(至少一个查询中包含20个组合)来计算不同数量的客户ID,但是solr的非重复计数算法非常慢,并且会占用大量内存,因此,如果查询结果超过20万,则solr的查询节点将崩溃.
The user will use different combinations range query of fields (20 combinations at least in one query) to count the distinct number of customer id, but the solr's distinct count algorithm is very slow and uses a lot of memory, so if the query result is more than 200 thousand, the solr's query node will crash.
德鲁伊在数量上是否比solr更好?
Does druid has better performance than solr in distinct count?
推荐答案
Druid与特定于搜索的数据库(例如ES/Solr)有很大的不同.这是一个专为分析而设计的数据库,您可以在其中进行汇总,列过滤,概率计算等.
Druid is vastly different from search-specific databases like ES/Solr. It is a database designed for analytics, where you can do rollups, column filtering, probabilistic computations, etc.
通过使用HyperLogLog(概率数据结构),Druid确实具有独特的意义.因此,如果您不担心100%的准确性,那么您绝对可以尝试Druid,而我在我的一个项目中看到的响应时间得到了极大的改善.但是,如果您在乎准确性,那么Druid可能不是最佳解决方案(即使在Druid中也有可能实现,但会影响性能并占用额外的空间)-在此处查看更多信息: https://groups.google.com/forum/#!topic/druid-development/AMSOVGx5PhQ
Druid does count distinct through its use of HyperLogLog, which is a probabilistic data-structure. So if you dont worry about 100% accuracy, you can definitely try Druid and I have seen drastic improvements in response times in one of my projects. But, if you care about accuracy, then Druid might not be the best solution (even though it is quite possible to achieve in Druid as well, with performance hits and extra space taken up) - see more here: https://groups.google.com/forum/#!topic/druid-development/AMSOVGx5PhQ
这篇关于德鲁伊vs Elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!