如果 mongo 节点离线时间过长,并且 oplog 在它恢复之前包装,那么它可能会陷入陈旧状态并需要手动干预。如何从副本集状态文档中识别该状态?它是否会停留在状态 3,它也被处于维护模式的节点使用,并且可能被可以 catch 的节点使用?如果是这样,我如何区分?
来自 http://docs.mongodb.org/manual/reference/replica-status/ :
Number State
0 Starting up, phase 1 (parsing configuration)
1 Primary
2 Secondary
3 Recovering (initial syncing, post-rollback, stale members)
4 Fatal error
5 Starting up, phase 2 (forking threads)
6 Unknown state (the set has never connected to the member)
7 Arbiter
8 Down
9 Rollback
10 Removed
最佳答案
它将处于状态 3,正在恢复。要专门识别陈旧状态,您需要查找 errmsg
字段。当陈旧时,有问题的辅助将有一个这样的 errmsg:
"errmsg" : "error RS102 too stale to catch up"
就完整输出而言,它看起来像这样:
rs.status()
{
"set" : "testReplSet",
"date" : ISODate("2013-01-29T01:39:38Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "hostname:31000",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 507,
"optime" : Timestamp(1359423456000, 893),
"optimeDate" : ISODate("2013-01-29T01:37:36Z"),
"self" : true
},
{
"_id" : 1,
"name" : "hostname:31001",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 483,
"optime" : Timestamp(1359423456000, 893),
"optimeDate" : ISODate("2013-01-29T01:37:36Z"),
"lastHeartbeat" : ISODate("2013-01-29T01:39:37Z"),
"pingMs" : 0
},
{
"_id" : 2,
"name" : "hostname:31002",
"health" : 1,
"state" : 3,
"stateStr" : "RECOVERING",
"uptime" : 4,
"optime" : Timestamp(1359423087000, 1),
"optimeDate" : ISODate("2013-01-29T01:31:27Z"),
"lastHeartbeat" : ISODate("2013-01-29T01:39:38Z"),
"pingMs" : 0,
"errmsg" : "error RS102 too stale to catch up"
}
],
"ok" : 1
}
最后,一个代码片段仅用于从 shell 打印错误(如果存在):
rs.status().members.forEach(function printError(rsmember){if (rsmember.errmsg){print(rsmember.errmsg)}})
关于mongodb - 识别 mongo 集群的陈旧成员,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/14573763/