问题描述
Cassandra不保证原子行为,因此一个副本出现故障的可能性很小,而另一个副本确实保留了更改.
Cassandra does not guarantee atomic behavior so there is a slight chance that one replica fails but other replica do persist the change.
是否有任何信息可以防止这种情况发生,以及如何解决该问题(如果发生的话)?卡桑德拉是否可以在这方面自愈?
Are there any information how to defend against this and what to do in order to heal it if it happens? Does Cassandra heal itself in that regard?
[更新]
我特别关注这样一种情况:您发送写请求给所有副本,而只有一个副本因写入错误而失败.写入失败的节点仍然有效并且可以运行.根据Cassandra文档,即使另外两个(如果您的复制因子为3)成功,写入请求也将返回失败.
I specially focus on the case where you send a write request to lets say all replica and only one replica fails with a write error. The node failing with the write is still alive and operational. According to the Cassandra documentation the write request will return a failure even the two other (if you have a replication factor of 3) succeeded.
在这种情况下,根据文档,两个副本已更改,而一个副本仍然是原始副本.有人说,在这种情况下,它的状态是不一致的,因为其他两个将无法回滚任何已写的更改.
According to the documentation in this case two replica has changed and one remains original. There was stated that in this case its a non-consistent state since the other two will not be able to roll back any change written.
因此问题来了,如何才能抵御这一挑战.
So the question goes how can one defend against that.
推荐答案
在cassandra中,这样的超时不被视为失败.请参阅此博客文章描述Cassandra的处理方式写的不同条件:
In cassandra a timeout such as this is not considered a failure. See this blog post describing how Cassandra handles different conditions when it comes to writes:
由于我们不知道副本失败之前发生了什么,我们该怎么说呢?协调器可以将结果强制为更新前或更新后状态.这就是Cassandra进行提示切换的方式.
How can we say that since we don’t know what happened before the replica failed? The coordinator can force the results towards either the pre-update or post-update state. This is what Cassandra does with hinted handoff.
...协调器将更新存储在本地,并在恢复时将其重新发送到失败的副本,从而将其强制为客户端最初想要的更新后状态.
...the coordinator stores the update locally, and will re-send it to the failed replica when it recovers, thus forcing it to the post-update state that the client wanted originally.
所以要回答您的问题,是的,cassandra会通过提示的切换来自我修复,并且当该过程失败时(即在副本上线之前已超过max_hint_window_in_ms),修复应该使情况保持一致.这是建议定期进行维修的原因之一.
So to answer your question, yes cassandra will heal itself using hinted handoff, and when that process fails (i.e. max_hint_window_in_ms exceeded before replica becomes online), a repair should get things into a consistent state. This is one reason why it is recommended to run repairs regularly.
本文在更多内容中解释了提示的切换细节.
This article explains hinted handoff in more detail.
这篇关于如果Cassandra报告失败但做了部分写入,该怎么办?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!