问题描述
Kafka具有同步副本集的概念,该副本集是距离领导者不远的节点集.
Kafka has the concept of a in-sync replica set, which is the set of nodes that aren't too far behind the leader.
如果网络干净地分区,使得包含领导者的少数人在一侧,而包含其他同步节点的另一人在另一侧,会发生什么呢?
What happens if the network cleanly partitions so that a minority containing the leader is on one side, and a majority containing the other in-sync nodes on the other side?
少数派/领导者大概认为它丢失了一堆节点,相应地减小了ISR的大小,并乐于进行.
The minority/leader-side presumably thinks that it lost a bunch of nodes, reduces the ISR size accordingly, and happily carries on.
另一端可能以为失去了领导者,因此选择了新的领导者并乐于继续.
The other side probably thinks that it lost the leader, so it elects a new one and happily carries on.
现在,我们在同一集群中有两个领导者,可以独立接受写操作.在需要大多数节点才能在分区后继续的系统中,旧的领导者将退出并停止接受写入.
Now we have two leaders in the same cluster, accepting writes independently. In a system that requires a majority of nodes to proceed after a partition, the old leader would step down and stop accepting writes.
在卡夫卡这种情况下会发生什么?是否需要多数表决才能更改ISR集?如果是这样,是否有短暂的数据丢失,直到主管方检测到中断为止?
What happens in this situation in Kafka? Does it require majority vote to change the ISR set? If so, is there a brief data loss until the leader side detects the outages?
推荐答案
我尚未对此进行测试,但是我认为公认的答案是错误的,拉斯·弗兰克(Lars Francke)关于脑裂的可能性是正确的.
I haven't tested this, but I think the accepted answer is wrong and Lars Francke is correct about the possibility of brain-split.
Zookeeper法定人数占多数,因此,如果ZK整体分区最多只有一个法定人数.
Zookeeper quorum requires a majority, so if ZK ensemble partitions, at most one side will have a quorum.
成为控制器需要与ZK(临时znode注册)进行活动会话.如果将当前控制器从ZK仲裁中分离出来,则它应该自愿停止将自己视为控制器.这最多需要花费 zookeeper.session.timeout.ms = 6000
.仍与ZK仲裁保持联系的经纪人应在他们之间选举一个新的控制人.(基于此: https://stackoverflow.com/a/52426734 )
Being a controller requires having an active session with ZK (ephemeral znode registration). If the current controller is partitioned away from ZK quorum, it should voluntarily stop considering itself a controller. This should take at most zookeeper.session.timeout.ms = 6000
. Brokers still connected to ZK quorum should elect a new controller among themselves. (based on this: https://stackoverflow.com/a/52426734)
要成为主题分区负责人,还需要与ZK进行积极的对话.与ZK仲裁失去联系的领导者应自愿停止成为一个领导者.当选控制器将检测一些前领导人失踪,将从ISR的那些分配新的领导人和仍然连接到ZK法定人数.
Being a topic-partition leader also requires an active session with ZK. Leader that lost a connection to ZK quorum should voluntarily stop being one. Elected controller will detect that some ex-leaders are missing and will assign new leaders from the ones in ISR and still connected to ZK quorum.
现在,在ZK超时窗口期间,分区的前领导者收到的生产者请求会如何处理?有可能.
Now, what happens to producer requests received by the partitioned ex-leader during ZK timeout window? There are some possibilities.
如果生产者的 acks = all
和主题的 min.insync.replicas = Replication.factor
,则所有ISR都应具有完全相同的数据.前领导者最终将拒绝进行中的写入,而制作人将重试它们.新当选的领导人将不会有任何数据丢失.另一方面,直到分区修复后,它才能处理任何写请求.生产者可以决定拒绝客户请求或在后台重试一段时间.
If producer's acks = all
and topic's min.insync.replicas = replication.factor
, then all ISR should have exactly the same data. The ex-leader will eventually reject in-progress writes and producers will retry them. The newly elected leader will not have lost any data. On the other hand it won't be able to serve any write requests until the partition heals. It will be up to producers to decide to reject client requests or keep retrying in the background for a while.
否则,新领导者很可能会丢失直到 zookeeper.session.timeout.ms +复制副本.lag.time.max.ms = 16000
的记录,他们将分区修复后,将被前领导截断.
Otherwise, it is very probable that the new leader will be missing up to zookeeper.session.timeout.ms + replica.lag.time.max.ms = 16000
worth of records and they will be truncated from the ex-leader after the partition heals.
比方说,您期望更长的网络分区比对只读感到满意的时间.
Let's say you expect longer network partitions than you are comfortable with being read-only.
类似的事情可以起作用:
Something like this can work:
- 您有3个可用区,并希望最多将1个区与其他2个区进行分区
- 在每个区域中,您都有一个Zookeeper节点(或几个),因此,合并的2个区域始终可以构成多数
- 在每个区域中,都有一堆Kafka经纪人
- 每个主题都有
replication.factor = 3
,每个可用性区域中有一个副本,min.insync.replicas = 2
- 生产者的
acks =全部
- you have 3 availability zones and expect that at most 1 zone will be partitioned from the other 2
- in each zone you have a Zookeeper node (or a few), so that 2 zones combined can always form a majority
- in each zone you have a bunch of Kafka brokers
- each topic has
replication.factor = 3
, one replica in each availability zone,min.insync.replicas = 2
- producers'
acks = all
这样,在网络分区的ZK仲裁端应有两个Kafka ISR,其中至少一个与前领导者完全一致.因此,经纪人不会丢失任何数据,并且可以从仍然能够连接到获胜方的任何生产者处进行写操作.
This way there should be two Kafka ISRs on ZK quorum side of the network partition, at least one of them fully up to date with ex-leader. So no data loss on the brokers, and available for writes from any producers that are still able to connect to the winning side.
这篇关于kafka如何处理网络分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!