问题描述
有人可以告诉我什么是针对Kafka消费者的重新平衡算法吗?我想了解分区数量和使用者线程如何影响这一点.
Can someone please tell me what the rebalancing algorithm is for Kafka consumers? I would like to understand how partition count and consumer threads affect this.
谢谢
推荐答案
好,因此目前有2种重新平衡算法-Range
和RoundRobin
.它们也称为分区分配策略.
Ok so there are 2 rebalancing algorithms at the moment - Range
and RoundRobin
. They are also called Partition Assignment Strategies.
为简单起见,假设我们有一个带有10个分区的主题T1
,并且我们还有2个使用不同配置的使用者(为了使示例更清楚)-C1
,其中num.streams
设置为1
和,并且num.streams
设置为2
.
For the simplicity assume we have a topic T1
with 10 partitions and we also have 2 consumers with different configurations (for the example to be clearer) - C1
with num.streams
set to 1
and C2
with num.streams
set to 2
.
以下是Range
策略的工作方式:
Here's how that would work with Range
strategy:
Range按数字顺序排列可用分区,按用户字典顺序排列使用者线程.因此,在本例中,分区的顺序为0, 1, 2, 3, 4, 5, 6, 7, 8, 9
,使用者线程的顺序为C1-0, C2-0, C2-1
.然后,将分区数除以使用者线程数,以确定每个使用者线程应拥有多少个分区.在我们的情况下,它不会平均分配,因此线程C1-0
将获得一个额外的分区.最终的分区分配如下所示:
Range lays out available partitions in numeric order and consumer threads in lexicographic order. So in our case the order of partitions will be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
and order of consumer threads will be C1-0, C2-0, C2-1
. Then the number of partitions is divided by the number of consumer threads to determine how many partitions each consumer thread should own. In our case it doesn't divide equally, so the thread C1-0
will get one extra partition. The final partition assignment would look like this:
C1-0
获取分区0, 1, 2, 3
C2-0
获取分区4, 5, 6
C2-1
获取分区7, 8, 9
C1-0
gets partitions 0, 1, 2, 3
C2-0
gets partitions 4, 5, 6
C2-1
gets partitions 7, 8, 9
如果将有11个分区,则这些使用者的分区分配将有所变化:
If there would be 11 partitions the partition assignment for these consumers would change a bit:
C1-0
将获得分区0, 1, 2, 3
C2-0
将获得分区4, 5, 6, 7
C2-1
将获得分区8, 9, 10
C1-0
would get partitions 0, 1, 2, 3
C2-0
would get partitions 4, 5, 6, 7
C2-1
would get partitions 8, 9, 10
就是这样.
相同的配置不适用于RoundRobin
策略,因为在所有订阅此主题的使用者上都需要相等的num.streams
,因此假设两个使用者现在都将num.streams
设置为2.与Range
策略相比,此处的主要区别是您无法预测重新平衡之前的分配.这是RoundRobin
策略的工作方式:
The same configuration wouldn't work for RoundRobin
strategy as it requires equal num.streams
across all consumers subscribed for this topic, so lets assume both consumers have num.streams
set to 2 now. One major difference compared to Range
strategy here is that you cannot predict what the assignment will be prior to rebalance. Here's how that would work with RoundRobin
strategy:
首先,在实际分配之前必须满足两个条件:
First, there are 2 conditions that MUST be satisfied before actual assignment:
a)每个主题在使用者实例中具有相同数量的流(这就是我上面提到的每个使用者不同数量的线程将不起作用的原因)
b)该组中每个消费者实例的订阅主题集都是相同的(我们在这里有一个主题,所以现在不成问题了.)
a) Every topic has the same number of streams within a consumer instance (that's why I mentioned above that different number of threads per consumer will not work)
b) The set of subscribed topics is identical for every consumer instance within the group (we have one topic here so that's not a problem now).
在验证了这两个条件后,topic-partition
对通过哈希码进行排序,以减少将一个主题的所有分区分配给一个使用者的可能性(如果要使用的主题不止一个).
When these 2 conditions are verified the topic-partition
pairs are sorted by hashcode to reduce the possibility of all partitions of one topic to be assigned to one consumer (if there is more than one topic to be consumed).
最后,所有topic-partition
对都以循环方式分配给可用的使用者线程.例如,如果我们的主题分区最终按以下顺序排序:T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9
和使用者线程为C1-0, C1-1, C2-0, C2-1
,则分配将如下所示:
And finally, all topic-partition
pairs are assigned in a round-robin fashion to available consumer threads. For example if our topic-partitions will end up sorted like this: T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9
and consumer threads are C1-0, C1-1, C2-0, C2-1
then the assignment will be like this:
T1-5
转到C1-0
T1-3
转到C1-1
T1-0
转到C2-0
T1-8
转到C2-1
此时,不再有使用者线程,但是仍然有更多的主题分区,因此在使用者线程上的迭代将从以下位置开始:T1-2
转到C1-0
T1-1
转到C1-1
T1-4
转到C2-0
T1-7
转到C2-1
再说一次:T1-6
转到C1-0
T1-9
转到C1-1
T1-5
goes to C1-0
T1-3
goes to C1-1
T1-0
goes to C2-0
T1-8
goes to C2-1
At this point no more consumer threads are left, but there are still more topic-partitions, so iteration over consumer threads starts over:T1-2
goes to C1-0
T1-1
goes to C1-1
T1-4
goes to C2-0
T1-7
goes to C2-1
And again:T1-6
goes to C1-0
T1-9
goes to C1-1
此时,所有主题分区都已分配,每个使用者线程的分区数几乎相等.
At this point all topic-partitions are assigned and each consumer thread has near-equal number of partitions each.
希望这会有所帮助.
这篇关于Kafka消费者再平衡算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!