本文介绍了Cassandra 批量查询 vs 单次插入性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Cassandra java驱动.

我每秒收到 15 万个请求,我将这些请求插入到 8 个具有不同分区键的表中.

我的问题是哪个更好:

  • 批量插入这些表
  • 一一插入.

我问这个问题是因为,考虑到我的请求大小 (150k),批处理听起来是更好的选择,但因为所有表都有不同的分区键,批处理看起来很昂贵.

解决方案

请从以下链接查看我的回答:

Cassandra 对具有不同分区的表的批量查询性能键

批处理不是为了提高性能.它们用于确保原子性和隔离性.

批处理对于单分区写操作很有效.但是批处理经常被错误地用于尝试优化性能.根据批处理操作,性能实际上可能会变差.

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html

如果这些表之间不需要数据一致性,则使用单次插入.单个请求在节点之间正确分布或传播(取决于负载平衡策略).如果您担心请求处理和使用批处理,批处理将在协调器节点上负担如此多的额外工作,我猜这将是效率不高的 :)

I use Cassandra java driver.

I receive 150k requests per second, which I insert to 8 tables having different partition keys.

My question is which is a better way:

  • batch inserting to these tables
  • inserting one by one.

I am asking this question because , considering my request size (150k), batch sounds like the better option but because all the tables have different partition keys, batch appears expensive.

解决方案

Please check my answer from below link:

Cassandra batch query performance on tables having different partition keys

Batches are not for improving performance. They are used for ensuring atomicity and isolation.

https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html

If data consistency is not needed among those tables, then use single insert.Single requests are distributed or propagated properly (depends on load balancing policy) among nodes. If you are concerned about request handling and use batch, batches will burden so many extra works on coordinator nodes which will not be efficient I guess :)

这篇关于Cassandra 批量查询 vs 单次插入性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 22:36