问题描述
solr/zookeeper面临的问题是zookeeper在10000ms后超时.错误如下.
We are facing issue with solr/zookeeper where zookeeper timeouts after 10000ms. Error below.
SolrException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper <server1>:9181,<server2>:9182,<server2>:9183 within 10000 ms.
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:184)
at org.apache.solr.common.cloud.SolrZkClient.<init>(SolrZkClient.java:121)
我们在Zookeeper日志中没有收到任何错误.除了下面的日志
We are not getting any error in zookeeper logs.Except below logs
2018-12-19 04:35:22,305 [myid:2] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x200830234de3127, timeout of 10000ms exceeded
2018-12-19 05:35:38,304 [myid:2] - INFO [SessionTracker:ZooKeeperServer@354] - Expiring session 0x200b4f912730086, timeout of 10000ms exceeded
在问题期间,线程数量很高,我们可能会在下面的weblogic服务器中注意到这一点.
During the issue threads go high and we could notice below in weblogic server.
Name: Connection evictor
State: TIMED_WAITING
Total blocked: 0 Total waited: 1
Stack trace:
java.lang.Thread.sleep(Native Method)
org.apache.http.impl.client.IdleConnectionEvictor$1.run(IdleConnectionEvictor.java:66)
java.lang.Thread.run(Thread.java:748)
这里可能出什么问题了?
What could be going wrong here?
推荐答案
以我的经验,ZK超时几乎总是归因于Solr节点上的某些问题,而不是ZK中的问题.
In my experience, ZK timeouts have almost always been due to something on the Solr node, rather than a problem in ZK.
您未提供所有时间戳,但理论是:
You don't provide all the timestamps, but the theory is that:
- Solr由于某种原因无法发送心跳
- ZK假定客户端已离开并关闭连接
- Solr尝试使用ZK关闭的连接
那么Solr节点为什么无法发送心跳?这可能是因为Solr节点只是过载(线程尖峰是原因,还是症状?),或者只是经过很长的GC暂停也可以做到这一点.
So why might the Solr node fail to send the heartbeat? This could be because the Solr node was simply overloaded, (Is the thread spike a cause, or a symptom?) or just working through a very long GC pause could do it too.
这篇关于Zookeeper的Solr中的Zookeeper超时没有错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!