问题描述
我有一个 Java 客户端,可以将 (INSERT) 记录批量推送到 Cassandra 集群.批处理中的元素都具有相同的行键,因此它们都将放置在同一个节点中.此外,我不需要事务是原子的,所以我一直在使用未记录的批处理.
I have a Java client that pushes (INSERT) records in batch to Cassandra cluster. The elements in the batch all have the same row key, so they all will be placed in the same node. Also I don't need the transaction to be atomic so I've been using unlogged batch.
每个批次中 INSERT 命令的数量取决于不同的因素,但可以是 5 到 50000 之间的任何值.首先,我只是在一个批次中放入尽可能多的命令并提交.这引发了 com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
.然后我使用了每批 1000 个 INSERT 的上限,然后降低到 300 个.我注意到我只是在不知道这个限制的确切来源的情况下随机猜测,这可能会导致麻烦.
The number of INSERT commands in each batch depends on different factors, but can be anything between 5 to 50000. First I just put as many commands as I had in one batch and submitted it. This threw com.datastax.driver.core.exceptions.InvalidQueryException: Batch too large
. Then I used a cap of 1000 INSERT per batch, and then down to 300. I noticed I'm just randomly guessing without knowing exactly where this limit comes from, which can cause trouble down the road.
我的问题是,这个限制是什么?我可以修改吗?我怎么知道一个批次可以放置多少个元素?当我的批次已满"时?
My question is, what is this limit? Can I modify it? How can I know how many elements can be placed in a batch? When my batch is "full"?
推荐答案
我建议不要增加上限,而只是拆分为多个请求.将所有内容都放在一个巨大的单一请求中会对协调器产生显着的负面影响.将所有内容都放在一个分区中可以通过减少一些延迟来提高某些大小批次的吞吐量,但批次绝不意味着用于提高性能.因此,尝试通过使用不同的批量大小来优化以获得最大吞吐量将在很大程度上取决于用例/架构/节点,并且需要进行特定的测试,因为在开始降级的大小上通常会有一个悬崖.
I would recommend not increasing the cap, and just splitting into multiple requests. Putting everything in a giant single request will negatively impact the coordinator significantly. Having everything in one partition can improve the throughput in some sized batches by reducing some latency, but batches are never meant to be used to improve performance. So trying to optimize to get maximum throughput by using different batch sizes will depend largely on use case/schema/nodes and will require specific testing, since there's generally a cliff on the size where it starts to degrade.
有一个
# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
batch_size_fail_threshold_in_kb: 50
您的 cassandra.yaml
中的选项来增加它,但一定要测试以确保您真正帮助而不是损害您的吞吐量.
option in your cassandra.yaml
to increase it, but be sure to test to make sure your actually helping and not hurting your throughput.
这篇关于Cassandra 中的批量限制是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!