问题描述
有了至少一次保证,我知道如果发生故障,有可能重复.但是,
1)Kafka Stream库执行提交的频率如何?
2)除上述内容外,用户是否还需要考虑提交内容?
3)是否有关于执行提交频率的最佳实践?
With atleast-once guarantee, I understand that there is a possibility of duplicates in case of failures. However,
1) How frequent does the Kafka Stream library performs commits?
2) Does the users ever need consider committing in addition to the above?
3) Is there a best practice on how frequent the commit should be performed?
推荐答案
Kafka Streams定期提交可通过参数commit.interval.ms
配置的提交(默认为30秒;如果启用了一次精确处理,则默认为100ms)
Kafka Streams commits in regular intervals that can be configured via parameter commit.interval.ms
(default is 30 seconds; if exactly-once processing is enabled, default is 100ms).
通常,用户无需手动提交.请注意,用户不能完全控制提交,而只能请求提交:cf. 如何使用Kafka Stream手动提交?
Usually, it's not necessary for users to commit manually. Note thought, that users don't have complete control over committing, but can only request commits: cf. How to commit manually with Kafka Stream?
提交是同步点,如果提交的频率太高(对于每个处理的记录之后的一个极端示例),吞吐量可能会大大下降.它也高度依赖于应用程序,因为提交频率决定了应用程序处理多少个潜在重复项(这也取决于输入数据速率).因此,您需要考虑在失败的情况下您愿意容忍多少重复项.它还取决于应用程序重新处理数据所花费的时间:在这段时间内,应用程序可能不完全可用.总体而言,很难给出建议,您需要单独考虑每个应用程序的描述权衡.
Commits are synchronizations point and if you commit too frequently (for an extreme example after every processed record) your throughput can drop significantly. It also highly application dependent, because the commit frequency determines how many potential duplicates the application processes (this also depends on the input data rate). Thus, you need to consider how many duplicates in case of failure your are willing to tolerate. It also depends how long it will take for the application to reprocess the data: during this time the application might not be fully available. Overall, it's hard to give a recommendation and you need to consider the described trade-offs for each application individually.
这篇关于Kafka Stream:消费者提交频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!