本文介绍了使用__consumer_offsets杀死节点不会导致使用者消耗任何消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有3个具有复制因子2的节点(nodes0,node1,node2)Kafka集群(broker0,broker1,broker2)和Zookeeper(使用Kafka tar打包的zookeeper)在另一个节点(节点4)上运行.

I have 3 node(nodes0,node1,node2) Kafka cluster(broker0, broker1, broker2) with replication factor 2 and Zookeeper(using zookeeper packaged with Kafka tar) running on a different node (node 4).

在启动Zookeper之后,我又启动了Broker 0,然后启动了其余节点.在代理0日志中可以看到它正在读取__consumer_offsets,似乎它们存储在代理0中.以下是示例日志:

I had started broker 0 after starting zookeper and then remaining nodes. It is seen in broker 0 logs that it is reading __consumer_offsets and seems they are stored on broker 0. Below are sample logs:

Kafka版本: kafka_2.10-0.10.2.0

    2017-06-30 10:50:47,381] INFO [GroupCoordinator 0]: Loading group metadata for console-consumer-85124 with generation 2 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-41 in 23 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,382] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-44 (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-44 in 5 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,387] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-47 (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Finished loading offsets from __consumer_offsets-47 in 11 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 10:50:47,398] INFO [Group Metadata Manager on Broker 0]: Loading offsets and group metadata from __consumer_offsets-1 (kafka.coordinator.GroupMetadataManager)

此外,我可以在同一代理0日志中看到GroupCoordinator消息.

Also, I can see GroupCoordinator messages in the same broker 0 logs.

[2017-06-30 14:35:22,874] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-34472 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:22,877] INFO [GroupCoordinator 0]: Group console-consumer-34472 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-6612 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:25,946] INFO [GroupCoordinator 0]: Group console-consumer-6612 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Preparing to restabilize group console-consumer-30165 with old generation 1 (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:35:38,326] INFO [GroupCoordinator 0]: Group console-consumer-30165 with generation 2 is now empty (kafka.coordinator.GroupCoordinator)
    [2017-06-30 14:43:15,656] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 3 milliseconds. (kafka.coordinator.GroupMetadataManager)
    [2017-06-30 14:53:15,653] INFO [Group Metadata Manager on Broker 0]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

使用 kafka-console-consumer.sh和kafka-console-producer.sh 对群集进行容错测试时,我发现在杀死代理1或代理2,消费者仍然可以接收来自生产者的新消息.重新平衡正确进行.

While testing fault tolerance for the cluster using the kafka-console-consumer.sh and kafka-console-producer.sh, I see that on killing broker 1 or broker 2, the consumer can still receive new messages coming from producer. The Rebalance is happening correctly.

但是,杀死代理0不会导致任何数量的使用者都不会消耗新消息或旧消息.下面是代理0被杀死之前和之后的主题状态.

However, killing broker 0 leads to no new or old messages consumption at any number of consumers.Below is the state of topic before and after broker 0 is killed.

之前

Topic:test-topic    PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: test-topic   Partition: 0    Leader: 2   Replicas: 2,0   Isr: 0,2
    Topic: test-topic   Partition: 1    Leader: 0   Replicas: 0,1   Isr: 0,1
    Topic: test-topic   Partition: 2    Leader: 1   Replicas: 1,2   Isr: 1,2

之后

Topic:test-topic    PartitionCount:3    ReplicationFactor:2 Configs:
    Topic: test-topic   Partition: 0    Leader: 2   Replicas: 2,0   Isr: 2
    Topic: test-topic   Partition: 1    Leader: 1   Replicas: 0,1   Isr: 1
    Topic: test-topic   Partition: 2    Leader: 1   Replicas: 1,2   Isr: 1,2

以下是代理0被杀死后在消费者日志中看到的WARN消息

Following are the WARN messages seen in the consumer logs after broker 0 is killed

[2017-06-30 14:19:17,155] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-34472: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-30 14:19:10,542] WARN Auto-commit of offsets {test-topic-2=OffsetAndMetadata{offset=4, metadata=''}, test-topic-0=OffsetAndMetadata{offset=5, metadata=''}, test-topic-1=OffsetAndMetadata{offset=4, metadata=''}} failed for group console-consumer-30165: Offset commit failed with a retriable exception. You should retry committing offsets. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

经纪人属性.其余默认属性保持不变.

Broker Properties. The remaining default properties are unchanged.

broker.id=0
delete.topic.enable=true

auto.create.topics.enable=false
listeners=PLAINTEXT://XXX:9092
advertised.listeners=PLAINTEXT://XXX:9092
log.dirs=/tmp/kafka-logs-test1
num.partitions=3
zookeeper.connect=XXX:2181

生产者属性.其余默认属性保持不变.

Producer properties. The remaining default properties are unchanged.

bootstrap.servers=XXX,XXX,XXX
compression.type=snappy

消费者属性.其余默认属性保持不变.

Consumer properties. The remaining default properties are unchanged.

zookeeper.connect=XXX:2181
zookeeper.connection.timeout.ms=6000
group.id=test-consumer-group

据我了解,如果拥有/运行GroupCoordinator和__consumer_offsets的节点死了,那么尽管新的分区领导者被选中,但消费者仍无法恢复正常操作.

As far I understand, if node holding/acting GroupCoordinator and __consumer_offsets dies, then the consumer unable to resume normal operations in spite of new leaders elected for partitions.

我在帖子中看到了类似的内容.这篇文章建议重新启动死代理节点.但是,尽管在生产环境中重新启动代理0之前要有更多的节点,但是消息消耗还是会有所延迟.

I see something similar posted in post. This post suggests to restart the dead broker node. However, there would be delay in message consumption in-spite of having more nodes until broker 0 is restarted in production environment.

第一季度:如何缓解上述情况?

第二季度:是否可以将GroupCoordinator __consumer_offsets更改为另一个节点?

感谢任何建议/帮助.

推荐答案

在__consumer_offsets主题上检查复制因子.如果不是3,那就是您的问题.

Check the replication factor on the __consumer_offsets topic. If it's not 3 then that's your problem.

运行以下命令kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets,并查看输出的第一行中是否显示"ReplicationFactor:1"或"ReplicationFactor:3".

Run the following command kafka-topics --zookeeper localhost:2181 --describe --topic __consumer_offsets and see if in the first line of output it says "ReplicationFactor:1" or "ReplicationFactor:3".

在尝试首先设置一个节点然后使用复制因子1创建该主题时,这是一个常见问题.稍后,当您扩展到3个节点时,您忘记更改该现有主题的主题级别设置,因此即使您生产和使用的主题都是容错的,偏移主题仍然仅停留在代理0上.

It's a common problem when doing trials to first setup one node and then this topic gets created with replication factor of 1. Later when you expand to 3 nodes you forget to change the topic level settings on this existing topic so even though the topics you are producing and consuming from are fault tolerant, the offsets topic is still stuck on broker 0 only.

这篇关于使用__consumer_offsets杀死节点不会导致使用者消耗任何消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-28 02:58