Version不等于Zookeeper经纪人无法恢复的zkVer

Version不等于Zookeeper经纪人无法恢复的zkVer

本文介绍了Kafka缓存的zkVersion不等于Zookeeper经纪人无法恢复的zkVersion的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由3个经纪人组成的kafka集群.我最近开始遇到问题,经纪人退出集群,生产者/消费者抛出领导者无法获得的错误.

I have a kafka cluster with 3 brokers. I have started facing issues lately with brokers going out of the cluster and producrs/consumers throwing leader not available errors.

在检查日志时,我看到以下事件序列:

On examining the logs I see following sequence of events:

//很多启动/停止副本获取程序线程

[2017-10-09 14:48:50,600] INFO [ReplicaFetcherManager on broker 6] Removed fetcher for partitions

[2017-10-09 14:48:50,608] INFO [ReplicaFetcherThread-0-7], Shutting down (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Stopped  (kafka.server.ReplicaFetcherThread)
[2017-10-09 14:48:50,918] INFO [ReplicaFetcherThread-0-7], Shutdown completed (kafka.server.ReplicaFetcherThread)

//连续扩展/收缩ISR

[2017-10-09 14:48:51,037] INFO Partition [__consumer_offsets,8] on broker 6: Expanding ISR for partition __consumer_offsets-8 from 6,8 to 6,8,7 (kafka.cluster.Partition)
[2017-10-09 14:48:51,038] INFO Partition [__consumer_offsets,35] on broker 6: Expanding ISR for partition __consumer_offsets-35 from 6,8 to 6,8,7 (kafka.cluster.Partition)

[2017-10-09 14:49:01,702] INFO Partition [t1,1] on broker 6: Shrinking ISR for partition [t1,1] from 6,7 to 6 (kafka.cluster.Partition)
[2017-10-09 14:49:01,702] INFO Partition [__consumer_offsets,41] on broker 6: Shrinking ISR for partition [__consumer_offsets,41] from 6,8,7 to 6,8 (kafka.cluster.Partition)

//重新注册经纪人和重新选举领导人

[2017-10-09 14:51:54,380] INFO re-registering broker info in ZK for broker 6

[2017-10-09 14:51:54,405] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)

//ControllerMovedException错误

[2017-10-09 14:56:39,746] ERROR [KafkaApi-6] Error when handling request.. org.apache.kafka.common.errors.ControllerMovedException: Broker 6 received update metadata request with correlation id 59 from an old controlle
r 7 with epoch 301. Latest known controller epoch is 302

[2017-10-09 14:57:59,210] INFO re-registering broker info in ZK for broker 6 (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,210] INFO Creating /brokers/ids/6 (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2017-10-09 14:57:59,213] INFO Registered broker 6 at path /brokers/ids/6 with addresses: EndPoint(kafka03,9092,ListenerName(PLAIN
TEXT),PLAINTEXT) (kafka.utils.ZkUtils)
[2017-10-09 14:57:59,213] INFO done re-registering broker (kafka.server.KafkaHealthcheck$SessionExpireListener)
[2017-10-09 14:57:59,213] INFO Subscribing to /brokers/topics path to watch for new topics (kafka.server.KafkaHealthcheck$SessionExpireListener
)
[2017-10-09 14:57:59,224] INFO New leader is 7 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
[2017-10-09 14:58:11,697] INFO Partition [testing1,2] on broker 6: Shrinking ISR for partition [testing1,2] from 6,8 to 6 (kafka.cluster.Partit
ion)
[2017-10-09 14:58:11,700] INFO Partition [testing1,2] on broker 6: Cached zkVersion [199] not equal to that in zookeeper, skip updating ISR (ka
fka.cluster.Partition)

然后这些错误循环出现,群集无法恢复

Then these errors occur in a loop, and cluster cannot recover

[2017-10-09 16:17:26,769] INFO Partition [__consumer_offsets,14] on broker 6: Shrinking ISR for partition [__consumer_offsets,14] from 7,6,8 to 7,6 (kafka.cluster.Partition)
[2017-10-09 16:17:26,771] INFO Partition [__consumer_offsets,14] on broker 6: Cached zkVersion [306] not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)

在客户端上,我收到领导者无法使用"错误.

On the clients, I receive Leader not available error.

不清楚集群为什么进入这种无效状态...有什么想法吗?

Its not clear why the cluster is entering this invalid state.. any ideas?

推荐答案

此问题在 KAFKA-2729 ,但至今尚未解决.据我所知,由于最大流量或在短时间内出现一些短暂的网络中断而造成的延迟较大的网络上会发生这种情况.唯一的解决方案(afaik)是重新启动所有代理.

This issue is known and tracked in KAFKA-2729 but not solved till now.This happens as far as I know on networks with big delays due to max traffic or some short network outages in a small timeframe.The only solution (afaik) is to restart all brokers.

这篇关于Kafka缓存的zkVersion不等于Zookeeper经纪人无法恢复的zkVersion的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-01 20:12