本文介绍了卡桑德拉(Cassandra)的高可用性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

1)我有 5个节点集群(172.30.56.60,172.30.56.61,172.30.56.62,172.30.56.63,172.30.56.129)



2)我创建了一个键空间,其中复制因子为3

写入一致性为3 ,我在表中插入了一行像下面这样以'1'分区,



插入用户(user_id,user_name,user_phone)VALUES(1,'ram',9003934069);



3)我使用nodetool getendpoints实用程序验证了数据的位置,并观察到数据已复制到三个节点60、129和62中。

  ./ nodetool getendpoints键空间测试用户1 
172.30.56.60
172.30.36.129
172.30.56.62

4)现在,如果我降低节点60,Cassandra需要将现有数据传输到'1,'ram ',9003934069'到其余节点(到61或63)以将RF保持为'3'?



但是Cassandra不是这样做,它也是意味着如果节点60、129和62都关闭,我将无法读取/写入表 user中分区 1下的任何数据?



问题1:所以,即使我有5个节点集群,如果它所在的数据/分区丢失了,集群也没用?



问题2:如果两个节点都关闭了(例如:60和129都关闭了),仍然有61,62和63处于运行状态,但是我无法在写入一致性= 3的分区'1'中写入任何数据,为什么会这样呢?当我能够以写入一致性= 1写入数据的地方时,再次说明分区的数据仅在群集中的预定义节点中可用,没有重新分区的可能性吗?



如果我的问题的任何部分不清楚,请告诉我,我想澄清一下。

解决方案

这不是Cassandra的工作方式-复制因子仅声明要在磁盘上不同节点上的Cassandra中存储多少数据副本。卡桑德拉(Cassandra)在数学上形成了节点之间的环。每个节点负责一系列所谓的令牌(基本上是分区键组件的哈希)。复制因子为3意味着数据将存储在该节点上,并照顾您的数据令牌和环中的下两个节点。



(在Google上快速搜索图像)



更改环形拓扑非常复杂,根本无法自动完成。



1)我有5个节点群集(172.30.56.60、172.30.56.61、172.30.56.62、172.30.56.63、172.30.56.129)



2)我创建了一个复制因子为3的键空间
写入一致性为3,我在表中插入了一个分区,分区如下所示为'1',



插入用户(user_id,user_name,user_phone)VALUES(1,'ram',9003934069);



3)I使用nodetool getendpoints实用程序验证了数据的位置,并观察到数据已复制到三个节点60、129和62中。



./ nodetool getendpoints keyspacetest用户1
172.30.56.60
172.30.36.129
172.30.56.62
4)现在,如果我降低节点60,Cassandra需要将现有数据传输到'1''ram', 9003934069到其余节点(分别为61或63)以将RF维持为 3?



但是Cassandra没有这样做,是否表示如果节点60、129和62都关闭了,我将无法读取/写入任何数据在表'user'的分区'1'下?

否。另一方面,存在一致性级别-在其中定义多少个节点在确认成功之前必须确认您的写入和读取请求。在上面,您还采用了CL = 3和RF = 3-这意味着所有持有副本的节点都必须做出响应,并且必须处于联机状态。如果一个故障,您的请求将一直失败(如果您的群集较大,例如6个节点,则三个联机可能是某些写入的正确请求)。



但是Cassandra具有可调整的一致性(请参见)。



例如,您可以选择QUORUM。然后需要(复制因子/ 2)+1个节点进行查询。在您的情况下(3/2)+ 1 = 1 + 1 = 2个节点。如果您确实需要一致的数据,那么QUORUM是完美的选择,因为在任何情况下,参与您的请求的至少一个节点在写入和读取之间将重叠,并具有最新数据。现在,一个节点可能已关闭,一切仍然可以进行。



但是:

看上面-这就是解释。 CL = 1的写入一致性将成功,因为一个节点仍处于联机状态,并且您仅请求一个节点确认您的写入。



当然,复制因子并不是毫无用处的。即使选择了较低的一致性级别,写入操作也会复制到所有可用节点,但是您不必在客户端等待它。如果节点在短时间内(默认为3个小时)宕机,则协调器将存储丢失的写入,并在该节点再次出现并且再次完全复制您的数据时重播它们。



如果节点长时间停机,则必须运行 nodetool repair 并让群集重建一致状态。无论如何,应该定期执行此操作,以保持维护群集的维护任务-由于网络/负载问题而可能会丢失写入操作,并且删除操作中的逻辑删除可能会很痛苦。



您可以删除或向集群添加节点(如果这样做,一次只能添加一个),Cassandra将为您重新分配环。



如果删除在线节点可以将其上的数据流传输到其他节点,可以删除离线节点,但是其上的数据将没有足够的副本,因此必须运行 nodetool修复



添加节点会将新令牌范围分配给新节点,并自动将数据流式传输到新节点。但是不会为您删除源节点的现有数据(确保您的安全),因此在添加节点后, nodetool cleanup 是您的朋友。



Cassandra从CAP定理中选择A(有效)和P(容忍)。
(请参见) 。因此,您随时都无法保持一致性-但QUORUM通常会绰绰有余。



让您的节点受到监控,不要太担心节点故障-它只是在磁盘消失或网络丢失的所有时间发生,而是为此设计应用程序。



更新:在您丢失数据或查询之前,由用户决定集群可能发生的情况。如果需要,可以使用更高的复制因子(RF = 7,CL.QUROUM允许丢失3),和/或甚至在不同位置使用多个数据中心,以防万一丢失整个数据中心(在现实生活中会发生,请考虑网络丢失) )。






对于以下有关:



集群大小3
复制因子2
写入级别2

读取级别1



您的读取是一致的:当然,您要求写入的内容必须是被所有副本确认。



您可以在不丢失任何节点的情况下幸免于难,而不会影响应用程序:参见上文,RF = 2和WC = 2随时要求所有节点都需要响应写入。因此,对于写入操作,您的应用程序将受到影响,对于读取操作,则可能会导致一个节点崩溃。 :由于将数据写入2个副本,并且仅当一个节点发生故障时才从一个副本读取,您仍然可以从另一个副本读取。



您每次实际上都是从1个节点进行读取:RC = 1要求您的读取由一个副本提供-第一个副本如果一个节点发生故障,则读取将执行确认操作,因为另一个节点可以确认您的读取操作,因此这无关紧要。



您实际上每次都在写入2个节点:WC = 2请求每次写入都将被两个副本确认-以及示例中的副本数。因此,在写入数据时,所有节点都必须在线。



每个节点持有您67%的数据:只是一些数学;)



使用这些设置,在写入群集时,节点丢失将无法幸免而不会受到影响。但是,您的数据将被写入两个副本上的磁盘-因此,如果丢失一个副本,您的数据仍将保留在另一个副本上,并从失效的节点中恢复。


1) I have 5 node cluster (172.30.56.60, 172.30.56.61, 172.30.56.62, 172.30.56.63, 172.30.56.129)

2) I created a keyspace with Replication Factor as 3
write consistency as 3, I have inserted a row in a table with the partition as '1' like below,

INSERT INTO user (user_id, user_name, user_phone) VALUES(1,'ram', 9003934069);

3) I verified the location of the data using the nodetool getendpoints utility and observed that the data is copied in three nodes 60, 129 and 62.

./nodetool getendpoints keyspacetest user 1
172.30.56.60
172.30.36.129
172.30.56.62

4) Now If I bring down the node 60, Cassandra needs to transfer the existing data to '1,'ram', 9003934069' to the remaining node (to either 61 or 63) to maintain the RF as '3'?

But Cassandra is not doing that, so does it mean that If the nodes 60, 129 and 62 are down I will not be able to read / write any data under the partition '1' in the table 'user' ?

Ques 1 : So even If I have 5 node cluster, If the data / partiton where it resides goes down, the cluster is useless?

Ques 2 : If two nodes are down (Example : 60 and 129 is down) still 61,62 and 63 are up and running, but I am not able to write any data in the partition '1' with the write consistency = 3, Why it is so? Where as I am able to write the data with the write consistency = 1 so this again says the data for the partition will be available only in the predefined nodes in cluster, No possibility for repartitioning?

If any part of my question is not clear, Please let me know, I would like to clarify it.

解决方案

That is not the way Cassandra works - replication factor 'only' declares how many copies of your data is to be stored Cassandra on disk on different nodes. Cassandra mathematically forms a ring out of your nodes. Each node is responsible for a range of so called tokens (which are basically a hash of your partition key components). Your replication factor of three means that data will be stored on the node taking care of your datas token and the next two nodes in the ring.

(quick googled image https://image.slidesharecdn.com/cassandratraining-161104131405/95/cassandra-training-19-638.jpg?cb=1478265472)

Changing the ring topology is quite complex and not done automatically at all.

1) I have 5 node cluster (172.30.56.60, 172.30.56.61, 172.30.56.62, 172.30.56.63, 172.30.56.129)

2) I created a keyspace with Replication Factor as 3write consistency as 3, I have inserted a row in a table with the partition as '1' like below,

INSERT INTO user (user_id, user_name, user_phone) VALUES(1,'ram', 9003934069);

3) I verified the location of the data using the nodetool getendpoints utility and observed that the data is copied in three nodes 60, 129 and 62.

./nodetool getendpoints keyspacetest user 1172.30.56.60172.30.36.129172.30.56.624) Now If I bring down the node 60, Cassandra needs to transfer the existing data to '1,'ram', 9003934069' to the remaining node (to either 61 or 63) to maintain the RF as '3'?

But Cassandra is not doing that, so does it mean that If the nodes 60, 129 and 62 are down I will not be able to read / write any data under the partition '1' in the table 'user' ?

No. On the other hand there is the consistency level - where you define how many nodes must acknowledge your write and read request before it is considered successful. Above you also took CL=3 and RF=3 - that means all nodes holding replicas have to respond and need to be online. If a single one is down your requests will fail all the time (if your cluster was bigger, say 6 nodes, chances are that the three online may be the 'right' ones for some writes).

But Cassandra has tuneable consistency (see the docs at http://docs.datastax.com/en/archived/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html).

You could pick QUORUM for example. Then (replication factor/2)+1 nodes are needed for queries. In your case (3/2)+1=1+1=2 nodes. QUORUM is perfect if you really need consistent data as in any case at least one node participating in your request will overlap between write and read and have the latest data. Now one node can be down and every thing will still work.

BUT:

Look above - that's the explanation. CL=1 for write consistency will succeed because one node is still online and you request only one to acknowledge your write.

Of course replication factor is not useless at all. Writes will replicate to all nodes available even if a lower consistency level is choosen, but you will not have to wait for it on client side. If a node is down for some short period (default 3 hours) of time the coordinator will store the missed writes and replay them if the node comes up again and your data is fully replicated again.

If a node is down for a longer period of time it is necessary to run nodetool repair and let the cluster rebuild a consistent state. That should be done on a regular schedule anyway as maintenance task to keep your cluster healty - there could be missed writes because of network/load issues and there are tombstones from deletes with could be a pain.

And you can remove or add nodes to your cluster (if doing so, just add one at a time) and Cassandra will repartition your ring for you.

In case of removing an online node can stream the data on it to the others, an offline node can be removed but the data on it will not have sufficient replicas so a nodetool repair must be run.

Adding nodes will assign new token ranges to the new node and automatically stream data to your new node. But existing data is not deleted for the source nodes for you (keeps you safe), so after adding nodes nodetool cleanup is your friend.

Cassandra chooses to be A(vailable) and P(artition tolerant) from CAP theorem.(see https://en.wikipedia.org/wiki/CAP_theorem). So you can't have consistency at any time - but QUORUM will often be more than enough.

Keep your nodes monitored and don't be too afraid of node failure - it simply happens all the time disks die or network is lost but design your applications for it.

Update: It's up to the user to choose what can happen to your cluster before you are loosing data or queries. If needed you can go with higher replication factors (RF=7 and CL.QUROUM tolerates loss of 3) and/or even with multiple datacenters on different locations in case one loses an entire datacenter (which happens in real life, think of network loss).


For the comment below regarding https://www.ecyrd.com/cassandracalculator/:

Cluster size 3Replication Factor 2Write Level 2
Read Level 1

Your reads are consistent: Sure, you request writes need to be ack'd by all replicas.

You can survive the loss of no nodes without impacting the application: See above, RF=2 und WC=2 request that at any time all nodes need to respond to writes. So for writes your application WILL be impacted, for reads one node can be down.

You can survive the loss of 1 node without data loss: as data is written to 2 replicas and you only read from one if one node is down you can still read from the other one.

You are really reading from 1 node every time: RC=1 requests your read to be served by one replica - so the frist one that ack's the read will do, if one node is down that won't matter as the other one can ack your read.

You are really writing to 2 nodes every time: WC=2 requests that every write will be ack'd by two replicas - which is also the number of replicas in your example. So all nodes need to be online when writing data.

Each node holds 67% of your data: Just some math ;)

With those settings you can't survive a node loss without impact while writing to your cluster. But your data is written to disk on two replicas - so if you lose one you still have your data on the other one and recover from a dead node.

这篇关于卡桑德拉(Cassandra)的高可用性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-18 09:33