问题描述
我正在尝试从Kafka提出一种消费设计.我正在使用0.8.1.1版本的Kafka.我正在考虑设计一个系统,在该系统中,将每隔几秒钟创建一个使用者,使用卡夫卡中的数据,对其进行处理,然后在将偏移量提交给卡夫卡后退出.在任何时间点,期望有250-300个使用者处于活动状态(在不同的计算机上作为ThreadPools运行).
I am trying to come up with a design for consuming from Kafka. I am using 0.8.1.1 version of Kafka. I am thinking of designing a system where the consumer will be created every few seconds, consume the data from Kafka, process it and then quits after committing the offsets to Kafka. At any point of time expect 250 - 300 consumers to be active (running as ThreadPools in different machines).
-
如何以及何时发生分区重新平衡?
How and When a rebalance of partitions happens?
在消费者之间重新分配分区的代价是多么昂贵.我期望有一个新的消费者能够完成工作,或者每隔几秒钟就会加入同一个消费者群体.所以我只想知道重新平衡操作的开销和延迟.
How costly is the rebalancing of partitions among the consumers. I am expecting a new consumer finishing up or joining every few seconds to the same consumer group. So I just want to know the overhead and latency of a rebalancing operation.
说消费者C1拥有分配给它的分区P1,P2,P3,并且它正在处理来自分区P1的消息M1.现在,消费者C2加入了该组.分区如何在C1和C2之间划分.是否有可能C1的提交(可能需要一些时间才能将其消息提交给Kafka)会被拒绝,而M1会被视为新消息并传递给其他人(我知道Kafka至少传递了一次)模式,但想确认重新分区是否有可能导致重新发送同一封邮件)?
Say Consumer C1 has Partitions P1, P2, P3 assigned to it and it is processing a message M1 from Partition P1. Now Consumer C2 joins the group. How is the partitions divided between C1 and C2. Is there a possibility where C1's (which might take some time to commit its message to Kafka) commit for M1 will be rejected and M1 will be treated as a fresh message and will be delivered to someone else (I know Kafka is at least once delivery model but wanted to confirm if the re partition by any chance cause a re delivery of the same message)?
推荐答案
如果我是你,我会重新考虑设计.也许您需要一个消费者群体?
I'd rethink the design if I were you. Perhaps you need a consumer pool?
-
每当消费者加入或离开该小组时,就会进行重新平衡.
Rebalancing happens every time a consumer joins or leaves the group.
Kafka和当前的消费者肯定是为长期运行的消费者设计的.新的消费者设计(计划为0.9)将更好地处理短命的消费者.根据我的经验,重新平衡需要100-500毫秒,这在很大程度上取决于ZooKeeper的工作方式.
Kafka and the current consumer were definitely designed for long running consumers. The new consumer design (planned for 0.9) will handle short-lived consumers better. Re-balances takes 100-500ms in my experience, depending a lot on how ZooKeeper is doing.
是的,在重新平衡期间经常发生重复.这就是为什么我们试图避免它们.您可以尝试通过更频繁地提交偏移来解决该问题,但是由于300个消费者频繁提交偏移量,并且有很多消费者加入和离开,您的Zookeeper可能会成为瓶颈.
Yes, duplicates happen often during rebalancing. Thats why we try to avoid them. You can try to work around that by committing offsets more frequently, but with 300 consumers committing frequently and a lot of consumers joining and leaving - your Zookeeper may become a bottleneck.
这篇关于重新平衡Kafka中某个主题的分区的成本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!