问题描述
为什么消费者连接到zookeeper来检索分区位置?并且 kafka 生产者必须连接到其中一个代理才能检索元数据.
Why is it that consumers connect to zookeeper to retrieve the partition locations? And kafka producers have to connect to one of the brokers to retrieve metadata.
我的观点是,当每个代理已经拥有所有必要的元数据来告诉生产者发送消息的位置时,zookeeper 的用途究竟是什么?经纪人不能将同样的信息发送给消费者吗?
My point is, what exactly is the use of zookeeper when every broker already has all the necessary metadata to tell producers the location to send their messages? Couldn't the brokers send this same information to the consumers?
我能理解为什么经纪人有元数据,不必在每次向他们发送新消息时都与 Zookeeper 建立连接.有没有我缺少的zookeeper 的功能?我发现很难想出为什么在 kafka 集群中真的需要 zookeeper.
I can understand why brokers have the metadata, to not have to make a connection to zookeeper each time a new message is sent to them. Is there a function that zookeeper has that I'm missing? I'm finding it hard to think of a reason why zookeeper is really needed within a kafka cluster.
推荐答案
首先,zookeeper 只有高级消费者才需要.SimpleConsumer
不需要 zookeeper 工作.
First of all, zookeeper is needed only for high level consumer. SimpleConsumer
does not require zookeeper to work.
高级消费者需要zookeeper的主要原因是跟踪消耗的偏移量并处理负载平衡.
The main reason zookeeper is needed for a high level consumer is to track consumed offsets and handle load balancing.
现在更详细.
关于偏移量跟踪,想象以下场景:您启动一个消费者,消费 100 条消息并关闭消费者.下次启动消费者时,您可能希望从上次消耗的偏移量(100)恢复,这意味着您必须将最大消耗的偏移量存储在某处.这就是 zookeeper 的作用所在:它存储每个组/主题/分区的偏移量.所以这样下次你启动你的消费者时它可能会问嘿zookeeper,我应该开始消费的偏移量是多少?".Kafka 实际上正在朝着能够不仅在 zookeeper 中存储偏移量的方向发展,而且还能够在其他存储中存储偏移量(目前只有 zookeeper
和 kafka
偏移量存储可用,我不确定 kafka
存储是否完全实现).
Regarding offset tracking, imagine following scenario: you start a consumer, consume 100 messages and shut the consumer down. Next time you start your consumer you'll probably want to resume from your last consumed offset (which is 100), and that means you have to store the maximum consumed offset somewhere. Here's where zookeeper kicks in: it stores offsets for every group/topic/partition. So this way next time you start your consumer it may ask "hey zookeeper, what's the offset I should start consuming from?". Kafka is actually moving towards being able to store offsets not only in zookeeper, but in other storages as well (for now only zookeeper
and kafka
offset storages are available and i'm not sure kafka
storage is fully implemented).
关于负载平衡,产生的消息量可能非常大,无法由 1 台机器处理,您可能希望在某个时候增加计算能力.假设您有一个具有 100 个分区的主题,并且为了处理如此数量的消息,您有 10 台机器.这里实际上出现了几个问题:
Regarding load balancing, the amount of messages produced can be quite large to be handled by 1 machine and you'll probably want to add computing power at some point. Lets say you have a topic with 100 partitions and to handle this amount of messages you have 10 machines. There are several questions that arise here actually:
- 这 10 台机器应该如何在彼此之间划分分区?
- 如果其中一台机器死亡会怎样?
- 如果您想添加另一台机器会怎样?
再一次,这里是zookeeper的作用所在:它跟踪组中的所有消费者,并且每个高级消费者都订阅了该组中的更改.关键是当消费者出现或消失时,zookeeper 会通知所有消费者并触发重新平衡,以便他们几乎相等地拆分分区(例如平衡负载).通过这种方式,它可以保证如果其中一个消费者死亡,其他人将继续处理该消费者拥有的分区.
And again, here's where zookeeper kicks in: it tracks all consumers in group and each high level consumer is subscribed for changes in this group. The point is that when a consumer appears or disappears, zookeeper notifies all consumers and triggers rebalance so that they split partitions near-equally (e.g. to balance load). This way it guarantees if one of consumer dies others will continue processing partitions that were owned by this consumer.
这篇关于为什么Kafka消费者连接zookeeper,生产者从broker获取元数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!