

我对AWS EC2上运行3 ElasticSearch节点的群集。这些节点使用OpsWorks设置/厨师。我的目的是要设计这个集群是非常有弹性,有弹性(需要时节点可以进来了)。

I have a cluster of 3 ElasticSearch nodes running on AWS EC2. These nodes are setup using OpsWorks/Chef. My intent is to design this cluster to be very resilient and elastic (nodes can come in and out when needed).


From everything I've read about ElasticSearch, it seems like no one recommends putting a load balancer in front of the cluster; instead, it seems like the recommendation is to do one of two things:

  1. 将您的客户端的URL /一个节点的IP,让ES做负载均衡,为您和希望节点不降。

  1. Point your client at the URL/IP of one node, let ES do the load balancing for you and hope that node never goes down.


Hard-code the URLs/IPs of ALL your nodes into your client app and have the app handle the failover logic.


My background is mostly in web farms where it's just common sense to create a huge pool of autonomous web servers, throw an ELB in front of them and let the load balancer decide what nodes are alive or dead. Why does ES not seem to support this same architecture?


您不需要负载平衡器 - ES已经提供了该功能。你只是一个组成部分,这可能会失礼的行为,这会增加不必要的网络跃点。

You don't need a load balancer — ES is already providing that functionality. You'd just another component, which could misbehave and which would add an unnecessary network hop.


ES will shard your data (by default into 5 shards), which it will try to evenly distribute among your instances. In your case 2 instances should have 2 shards and 1 just one, but you might want to change the shards to 6 for an equal distribution.

在默认情况下复制设置为number_of_replicas:1 ,所以每个碎片的一个副本。假设你使用的是6碎片,它可能看起来像这样(R是一个复制碎片):

By default replication is set to "number_of_replicas":1, so one replica of each shard. Assuming you are using 6 shards, it could look something like this (R is a replicated shard):

  • NODE0:1,4,R3,R6
  • 节点1:2,6,R1,R5
  • 节点2:3,5,R2,R4


Assuming node1 dies, the cluster would change to the following setup:

  • NODE0:1,4,6,R3 +新副本R5,R2
  • 节点2:3,5,2,R4 +新副本R1,R6


Depending on your connection setting, you can either connect to one instance (transport client) or you could join the cluster (node client). With the node client you'll avoid double hops, since you'll always connect to the correct shard / index. With the transport client, your requests will be routed to the correct instance.


So there's nothing to load balance for yourself, you'd just add overhead. The auto-clustering is probably ES's greatest strength.


07-29 11:00