我们在AWS上运行的MongoDB副本集遇到时钟漂移问题。在向集合中添加了其他数据之后,这似乎才刚刚开始发生,直到那时我们才真正注意到这个问题,除非系统负载沉重。在mongod.log文件中偶尔会记录以下错误,并且系统未处于负载状态。

为了测试这一点,我们隔离了一组具有相同数据集的机器,尽管该错误仍在发生,但我们的Web应用程序并未使用它们。



从上面的时间戳显示,其中一个mongodb副本集成员落后一分钟多。我们看到的最糟糕的情况是不同步12分钟。

该错误继而导致复制滞后,尽管它会自行纠正,但我们仍从Mongo Monitoring Service收到有关此问题的通知。

设置为3 x r3.xlarge AWS Linux实例,在EU-West-1A区域的每个可用性区域中为1。使用Mongo建议的设置和Raid阵列以及Mongo提供的cloud formation脚本对机器进行设置。数据大小约为4GB。

我们认为该问题与NTP同步有关,默认情况下,在AWS Linux Amazon Machine Image上,ntpd服务配置为转到www.pool.ntp.org上托管的aws ntp服务器池。

为了排除这种情况,我们在AWS上设置了自己的NTP服务器,MongoDB服务器可以同步到该服务器。问题仍然存在,因此我们更改了mongo机器上ntpd服务的maxpoll和minpoll时间,以同步来自NTP服务器的every 16 seconds时间,但该错误仍在发生。

我们还增加了MongoDB OpLog的大小,以查看是否会有任何不同,但这没有。

还有其他人遇到这种类型的问题吗?我们缺少什么吗?

干杯,

科林

ps -ef | grep ntp;

mongodb1
ntp       5163     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 15865 15839  0 09:31 pts/2    00:00:00 grep ntp

mongodb2
ntp       4834     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 19056 19029  0 09:31 pts/0    00:00:00 grep ntp

mongodb3
ntp       5795     1  0 Dec11 ?        00:00:00 ntpd -u ntp:ntp -p /var/run/ntpd.pid -g
ec2-user 26199 26173  0 09:31 pts/0    00:00:00 grep ntp

猫/etc/ntp.conf;
# For more information about this file, see the man pages
# ntp.conf(5), ntp_acc(5), ntp_auth(5), ntp_clock(5), ntp_misc(5), ntp_mon(5).

driftfile /var/lib/ntp/drift

# Permit time synchronization with our time source, but do not
# permit the source to query or modify the service on this system.
restrict default kod nomodify notrap nopeer noquery
restrict -6 default kod nomodify notrap nopeer noquery

# Permit all access over the loopback interface.  This could
# be tightened as well, but to do so would effect some of
# the administrative functions.
restrict 127.0.0.1
restrict -6 ::1

# Hosts on local network are less restricted.
#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.amazon.pool.ntp.org iburst dynamic
#server 1.amazon.pool.ntp.org iburst dynamic
#server 2.amazon.pool.ntp.org iburst dynamic
#server 3.amazon.pool.ntp.org iburst dynamic
server time-server.domain.com iburst

#broadcast 192.168.1.255 autokey        # broadcast server
#broadcastclient                        # broadcast client
#broadcast 224.0.1.1 autokey            # multicast server
#multicastclient 224.0.1.1              # multicast client
#manycastserver 239.255.254.254         # manycast server
#manycastclient 239.255.254.254 autokey # manycast client

# Enable public key cryptography.
#crypto

includefile /etc/ntp/crypto/pw

# Key file containing the keys and key identifiers used when operating
# with symmetric key cryptography.
keys /etc/ntp/keys

# Specify the key identifiers which are trusted.
#trustedkey 4 8 42

# Specify the key identifier to use with the ntpdc utility.
#requestkey 8

# Specify the key identifier to use with the ntpq utility.
#controlkey 8

# Enable writing of statistics records.
#statistics clockstats cryptostats loopstats peerstats

# Enable additional logging.
logconfig =clockall =peerall =sysall =syncall

# Listen only on the primary network interface.
interface listen eth0
interface ignore ipv6

ntpq -npcrv;
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*172.31.14.137   91.*.*.*      3 u  557 1024  377    1.121   -0.264   0.161
associd=0 status=0615 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd [email protected] Sat Mar 23 00:37:31 UTC 2013 (1)",
processor="x86_64", system="Linux/3.14.23-22.44.amzn1.x86_64", leap=00,
stratum=4, precision=-23, rootdelay=23.597, rootdisp=109.962,
refid=172.31.14.137,
reftime=d83a757a.175b5fa1  Tue, Dec 16 2014  9:10:18.091,
clock=d83a77a7.82431efa  Tue, Dec 16 2014  9:19:35.508, peer=27361,
tc=10, mintc=3, offset=-0.264, frequency=-13.994, sys_jitter=0.000,
clk_jitter=0.358, clk_wander=0.053

最佳答案

使用WiredTiger存储引擎升级到MongoDB 3之后,我们不再看到此问题。

关于linux - AWS-EC2-MongoDB副本集时间同步问题-NTP-复制滞后,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/27447810/

10-11 20:39