本文介绍了Kafka消费者再平衡算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以告诉我什么是针对Kafka消费者的重新平衡算法吗?我想了解分区数量和使用者线程如何影响这一点.

Can someone please tell me what the rebalancing algorithm is for Kafka consumers? I would like to understand how partition count and consumer threads affect this.

谢谢

推荐答案

好,因此目前有2种重新平衡算法-RangeRoundRobin.它们也称为分区分配策略.

Ok so there are 2 rebalancing algorithms at the moment - Range and RoundRobin. They are also called Partition Assignment Strategies.

为简单起见,假设我们有一个带有10个分区的主题T1,并且我们还有2个使用不同配置的使用者(为了使示例更清楚)-C1,其中num.streams设置为1和,并且num.streams设置为2.

For the simplicity assume we have a topic T1 with 10 partitions and we also have 2 consumers with different configurations (for the example to be clearer) - C1 with num.streams set to 1 and C2 with num.streams set to 2.

以下是Range策略的工作方式:

Here's how that would work with Range strategy:

Range按数字顺序排列可用分区,按用户字典顺序排列使用者线程.因此,在本例中,分区的顺序为0, 1, 2, 3, 4, 5, 6, 7, 8, 9,使用者线程的顺序为C1-0, C2-0, C2-1.然后,将分区数除以使用者线程数,以确定每个使用者线程应拥有多少个分区.在我们的情况下,它不会平均分配,因此线程C1-0将获得一个额外的分区.最终的分区分配如下所示:

Range lays out available partitions in numeric order and consumer threads in lexicographic order. So in our case the order of partitions will be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and order of consumer threads will be C1-0, C2-0, C2-1. Then the number of partitions is divided by the number of consumer threads to determine how many partitions each consumer thread should own. In our case it doesn't divide equally, so the thread C1-0 will get one extra partition. The final partition assignment would look like this:

C1-0获取分区0, 1, 2, 3
C2-0获取分区4, 5, 6
C2-1获取分区7, 8, 9

C1-0 gets partitions 0, 1, 2, 3
C2-0 gets partitions 4, 5, 6
C2-1 gets partitions 7, 8, 9

如果将有11个分区,则这些使用者的分区分配将有所变化:

If there would be 11 partitions the partition assignment for these consumers would change a bit:

C1-0将获得分区0, 1, 2, 3
C2-0将获得分区4, 5, 6, 7
C2-1将获得分区8, 9, 10

C1-0 would get partitions 0, 1, 2, 3
C2-0 would get partitions 4, 5, 6, 7
C2-1 would get partitions 8, 9, 10

就是这样.

相同的配置不适用于RoundRobin策略,因为在所有订阅此主题的使用者上都需要相等的num.streams,因此假设两个使用者现在都将num.streams设置为2.与Range策略相比,此处的主要区别是您无法预测重新平衡之前的分配.这是RoundRobin策略的工作方式:

The same configuration wouldn't work for RoundRobin strategy as it requires equal num.streams across all consumers subscribed for this topic, so lets assume both consumers have num.streams set to 2 now. One major difference compared to Range strategy here is that you cannot predict what the assignment will be prior to rebalance. Here's how that would work with RoundRobin strategy:

首先,在实际分配之前必须满足两个条件:

First, there are 2 conditions that MUST be satisfied before actual assignment:

a)每个主题在使用者实例中具有相同数量的流(这就是我上面提到的每个使用者不同数量的线程将不起作用的原因)
b)该组中每个消费者实例的订阅主题集都是相同的(我们在这里有一个主题,所以现在不成问题了.)

a) Every topic has the same number of streams within a consumer instance (that's why I mentioned above that different number of threads per consumer will not work)
b) The set of subscribed topics is identical for every consumer instance within the group (we have one topic here so that's not a problem now).

在验证了这两个条件后,topic-partition对通过哈希码进行排序,以减少将一个主题的所有分区分配给一个使用者的可能性(如果要使用的主题不止一个).

When these 2 conditions are verified the topic-partition pairs are sorted by hashcode to reduce the possibility of all partitions of one topic to be assigned to one consumer (if there is more than one topic to be consumed).

最后,所有topic-partition对都以循环方式分配给可用的使用者线程.例如,如果我们的主题分区最终按以下顺序排序:T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9和使用者线程为C1-0, C1-1, C2-0, C2-1,则分配将如下所示:

And finally, all topic-partition pairs are assigned in a round-robin fashion to available consumer threads. For example if our topic-partitions will end up sorted like this: T1-5, T1-3, T1-0, T1-8, T1-2, T1-1, T1-4, T1-7, T1-6, T1-9 and consumer threads are C1-0, C1-1, C2-0, C2-1 then the assignment will be like this:

T1-5转到C1-0
T1-3转到C1-1
T1-0转到C2-0
T1-8转到C2-1
此时,不再有使用者线程,但是仍然有更多的主题分区,因此在使用者线程上的迭代将从以下位置开始:
T1-2转到C1-0
T1-1转到C1-1
T1-4转到C2-0
T1-7转到C2-1
再说一次:
T1-6转到C1-0
T1-9转到C1-1

T1-5 goes to C1-0
T1-3 goes to C1-1
T1-0 goes to C2-0
T1-8 goes to C2-1
At this point no more consumer threads are left, but there are still more topic-partitions, so iteration over consumer threads starts over:
T1-2 goes to C1-0
T1-1 goes to C1-1
T1-4 goes to C2-0
T1-7 goes to C2-1
And again:
T1-6 goes to C1-0
T1-9 goes to C1-1

此时,所有主题分区都已分配,每个使用者线程的分区数几乎相等.

At this point all topic-partitions are assigned and each consumer thread has near-equal number of partitions each.

希望这会有所帮助.

这篇关于Kafka消费者再平衡算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-07 08:34