我正在尝试使用KUBE_PING JGroups协议(protocol)在Kubernetes上以HA Full模式运行Wildfly。一切正常,我可以扩展群集,并且节点之间可以相互识别而不会出现任何问题。
当我尝试按比例缩小群集时,会发生此问题。 ActiveMQ Artemis一直提示它无法连接到断开连接的节点,即使JGroups承认旧节点已离开群集也是如此。
我想知道我在JGroups配置中可能做错了什么。我已经附加了一些日志消息,以及KUBE_PING
的JGroups配置。
为了确保提供尽可能多的信息,我在最新的Wildfly官方Docker镜像 15.0.1.Final 上运行,该镜像在JDK 11上运行。
在此先感谢您的帮助!
编辑:固定错别字
节点断开连接的JGroups确认
wildfly-kube 12:48:36,514 INFO [org.apache.activemq.artemis.core.server] (Thread-22 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@10f88645)) AMQ221027: Bridge ClusterConnectionBridge@379d51e3 [name=$.artemis.internal.sf.my-cluster.7ee91868-337b-11e9-9849-ce422226aad5, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.7ee91868-337b-11e9-9849-ce422226aad5, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=314721ae-337b-11e9-9cfa-0e8a9828b1cb], temp=false]@195607a8 targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@379d51e3 [name=$.artemis.internal.sf.my-cluster.7ee91868-337b-11e9-9849-ce422226aad5, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.7ee91868-337b-11e9-9849-ce422226aad5, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=314721ae-337b-11e9-9cfa-0e8a9828b1cb], temp=false]@195607a8 targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-116-0-4], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1699294977[nodeUUID=314721ae-337b-11e9-9cfa-0e8a9828b1cb, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-122-0-6, address=jms, server=ActiveMQServerImpl::serverUUID=314721ae-337b-11e9-9cfa-0e8a9828b1cb])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-116-0-4], discoveryGroupConfiguration=null]] is connected
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:38,905 WARN [org.apache.activemq.artemis.core.server] (Thread-5 (ActiveMQ-client-global-threads)) AMQ222095: Connection failed with failedOver=false
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:43,758 ERROR [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,wildfly-kube-b6f69fb9-b2hd5) JGRP000034: wildfly-kube-b6f69fb9-b2hd5: failure sending message to wildfly-kube-b6f69fb9-nshvn: java.net.SocketTimeoutException: connect timed out
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,759 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN000094: Received new cluster view for channel ejb: [wildfly-kube-b6f69fb9-b2hd5|2] (1) [wildfly-kube-b6f69fb9-b2hd5]
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,772 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN100001: Node wildfly-kube-b6f69fb9-nshvn left the cluster
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,777 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN000094: Received new cluster view for channel ejb: [wildfly-kube-b6f69fb9-b2hd5|2] (1) [wildfly-kube-b6f69fb9-b2hd5]
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,779 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN100001: Node wildfly-kube-b6f69fb9-nshvn left the cluster
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,787 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN000094: Received new cluster view for channel ejb: [wildfly-kube-b6f69fb9-b2hd5|2] (1) [wildfly-kube-b6f69fb9-b2hd5]
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,788 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN100001: Node wildfly-kube-b6f69fb9-nshvn left the cluster
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,791 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN000094: Received new cluster view for channel ejb: [wildfly-kube-b6f69fb9-b2hd5|2] (1) [wildfly-kube-b6f69fb9-b2hd5]
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 12:48:44,792 INFO [org.infinispan.CLUSTER] (VERIFY_SUSPECT.TimerThread-13,ejb,wildfly-kube-b6f69fb9-b2hd5) ISPN100001: Node wildfly-kube-b6f69fb9-nshvn left the cluster
重复的ActiveMQ Artemis警告(每3秒一次)
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 13:02:11,825 WARN [org.apache.activemq.artemis.core.server] (Thread-55 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@866e807)) AMQ224091: Bridge ClusterConnectionBridge@39836857 [name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5], temp=false]@39425add targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@39836857 [name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5], temp=false]@39425add targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-122-0-6], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1432944139[nodeUUID=7ee91868-337b-11e9-9849-ce422226aad5, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-116-0-4, address=jms, server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-122-0-6], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying
wildfly-kube-b6f69fb9-b2hd5 wildfly-kube 13:02:14,897 WARN [org.apache.activemq.artemis.core.server] (Thread-68 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@866e807)) AMQ224091: Bridge ClusterConnectionBridge@39836857 [name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5], temp=false]@39425add targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@39836857 [name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.314721ae-337b-11e9-9cfa-0e8a9828b1cb, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5], temp=false]@39425add targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-122-0-6], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@1432944139[nodeUUID=7ee91868-337b-11e9-9849-ce422226aad5, connector=TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-116-0-4, address=jms, server=ActiveMQServerImpl::serverUUID=7ee91868-337b-11e9-9849-ce422226aad5])) [initialConnectors=[TransportConfiguration(name=http-connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?httpUpgradeEndpoint=http-acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=100-122-0-6], discoveryGroupConfiguration=null]] is unable to connect to destination. Retrying
JGroups配置
<subsystem xmlns="urn:jboss:domain:jgroups:6.0">
<channels default="ee">
<channel name="ee" stack="tcp" cluster="ejb"/>
</channels>
<stacks>
<stack name="tcp">
<transport type="TCP" socket-binding="jgroups-tcp">
<property name="logical_addr_cache_expiration">360000</property>
</transport>
<protocol type="kubernetes.KUBE_PING">
<property name="namespace">${KUBERNETES_CLUSTER_NAMESPACE:default}</property>
<property name="labels">${KUBERNETES_CLUSTER_LABEL:cluster=nyc}</property>
<property name="port_range">0</property>
</protocol>
<protocol type="MERGE3"/>
<protocol type="FD_SOCK" socket-binding="jgroups-tcp-fd"/>
<protocol type="FD_ALL"/>
<protocol type="VERIFY_SUSPECT"/>
<protocol type="pbcast.NAKACK2">
<property name="use_mcast_xmit">false</property>
</protocol>
<protocol type="UNICAST3"/>
<protocol type="pbcast.STABLE"/>
<protocol type="pbcast.GMS">
<property name="join_timeout">30000</property>
<property name="print_local_addr">true</property>
<property name="print_physical_addrs">true</property>
</protocol>
<protocol type="MFC"/>
<protocol type="FRAG3"/>
</stack>
</stacks>
ActiveMQ Artemis配置
<subsystem xmlns="urn:jboss:domain:messaging-activemq:5.0">
<server name="default">
<cluster user="my_admin" password="my_password"/>
<security-setting name="#">
<role name="guest" send="true" consume="true" create-non-durable-queue="true" delete-non-durable-queue="true"/>
</security-setting>
<address-setting name="#" dead-letter-address="jms.queue.DLQ" expiry-address="jms.queue.ExpiryQueue" max-size-bytes="10485760" page-size-bytes="2097152" message-counter-history-day-limit="10" redistribution-delay="1000"/>
<http-connector name="http-connector" socket-binding="http" endpoint="http-acceptor"/>
<http-connector name="http-connector-throughput" socket-binding="http" endpoint="http-acceptor-throughput">
<param name="batch-delay" value="50"/>
</http-connector>
<in-vm-connector name="in-vm" server-id="0">
<param name="buffer-pooling" value="false"/>
</in-vm-connector>
<http-acceptor name="http-acceptor" http-listener="default"/>
<http-acceptor name="http-acceptor-throughput" http-listener="default">
<param name="batch-delay" value="50"/>
<param name="direct-deliver" value="false"/>
</http-acceptor>
<in-vm-acceptor name="in-vm" server-id="0">
<param name="buffer-pooling" value="false"/>
</in-vm-acceptor>
<broadcast-group name="bg-group1" jgroups-cluster="activemq-cluster" connectors="http-connector"/>
<discovery-group name="dg-group1" jgroups-cluster="activemq-cluster"/>
<cluster-connection name="my-cluster" address="jms" connector-name="http-connector" discovery-group="dg-group1"/>
<jms-queue name="ExpiryQueue" entries="java:/jms/queue/ExpiryQueue"/>
<jms-queue name="DLQ" entries="java:/jms/queue/DLQ"/>
<connection-factory name="InVmConnectionFactory" entries="java:/ConnectionFactory" connectors="in-vm"/>
<connection-factory name="RemoteConnectionFactory" entries="java:jboss/exported/jms/RemoteConnectionFactory" connectors="http-connector" ha="true" block-on-acknowledge="true" reconnect-attempts="-1"/>
<pooled-connection-factory name="activemq-ra" entries="java:/JmsXA java:jboss/DefaultJMSConnectionFactory" connectors="in-vm" transaction="xa"/>
</server>
更新:
我要添加的一件事是,如果容器正常关闭,Artemis似乎可以正确处理断开连接。在Kubernetes部署中的容器定义中添加preStop命令以在容器终止之前关闭Wildfly,这有助于将容器从群集中正常移出。
最佳答案
ActiveMQ Artemis仅使用JGroups(或任何其他发现机制)来发现其他代理,以将它们聚集在一起。一旦发现另一个代理,它们便在它们之间建立TCP连接,在此之后,JGroups不会扮演任何角色,这意味着看到代理离开群集的JGroups无关紧要。
群集桥接器故障的事实足以告诉ActiveMQ Artemis代理已离开群集。此时的问题是,代理应如何响应死节点。默认情况下,它将希望无限期地重新连接,因为它希望节点在某个时候返回。在传统用例中,这是一个合理的期望,但在云中则没有那么多。此行为由reconnect-attempts
上的cluster-connection
属性控制。将reconnect-attempts
设置为您认为合理的值(例如10),您会看到网桥重新连接放弃并停止记录。