问题描述
我有一个三节点的 Cassandra 集群,我创建了一个包含超过 2,000,000 行的表.
I have a three nodes Cassandra Cluster and I have created one table which has more than 2,000,000 rows.
当我在 cqlsh 中执行此 (select count(*) from userdetails
) 查询时,出现此错误:
When I execute this (select count(*) from userdetails
) query in cqlsh, I got this error:
OperationTimedOut: errors={}, last_host=192.168.1.2
当我为更少的行或限制为 50,000 运行计数函数时,它工作正常.
When I run count function for less row or with limit 50,000 it works fine.
推荐答案
count(*) 实际上翻遍了所有数据.因此,没有限制的 select count(*) from userdetails
预计会因这么多行而超时.这里的一些细节:http://planetcassandra.org/blog/counting-key-in-cassandra/
count(*) actually pages through all the data. So a select count(*) from userdetails
without a limit would be expected to timeout with that many rows. Some details here:http://planetcassandra.org/blog/counting-key-in-cassandra/
您可能需要考虑自己维护计数,使用 Spark,或者如果您只想要一个棒球场号码,您可以从 JMX 获取它.
You may want to consider maintaining the count yourself, using Spark, or if you just want a ball park number you can grab it from JMX.
从 JMX 中获取它可能有点棘手,具体取决于您的数据模型.要获取分区数,请获取 org.apache.cassandra.metrics:type=ColumnFamily,keyspace={{Keyspace}},scope={{Table }},name=EstimatedColumnCountHistogram
mbean 并求和增加所有 90 个值(这是 nodetool cfstats
输出的内容).它只会为您提供 sstables 中存在的数字,因此为了使其更准确,您可以进行刷新或尝试从 MemtableColumnsCount
mbean
To grab from JMX it can be a little tricky depending on your data model. To get the number of partitions grab the org.apache.cassandra.metrics:type=ColumnFamily,keyspace={{Keyspace}},scope={{Table}},name=EstimatedColumnCountHistogram
mbean and sum up all the 90 values (this is what nodetool cfstats
outputs). It will only give you the number that exist in sstables so to make it more accurate you can do a flush or try to estimate number in memtables from the MemtableColumnsCount
mbean
对于一个非常基本的大概数字,您可以从 system.size_estimates
获取所有列出的范围内的估计分区计数(请注意,这只是一个节点上的数字).将其乘以节点数,然后除以 RF.
For a very basic ballpark number you can grab the estimated partition counts from system.size_estimates
across all the ranges listed (note that this is only number on one node). Multiply that out by number of nodes, then divided by RF.
这篇关于cassandra 的 cqlsh 控制台中的操作超时错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!