问题描述
想象一下一个由多个nix机器组成的网络.专用节点存储文件,并定期调度修改这些文件的Task A
.其他每个节点都计划Task B
将这些文件同步(rsync
)到本地存储.
Imagine a network of several nix machines. A dedicated node stores files and periodically schedules Task A
that modifies these files. Each of the other nodes schedules Task B
that syncs (rsync
) those files to local storage.
Task A
可能要花费大量时间,并且文件收集在所有节点上都必须处于一致状态.因此,Task B
不应在Task A
运行时运行.
Task A
can take considerable amount of time and the file collection needs to be in a consistent state on all nodes. Thus Task B
shouldn't run while Task A
is running.
为此可能的解决方案是使用读写器锁. Task A
和Task B
分别在资源上设置了写锁定和读锁定.
A possible solution for this is to use a reader-writer lock. Task A
and Task B
would put a write and a read lock on the resource respectively.
我想知道如何用unix shell脚本实现这种锁定机制.
I wonder how can we implement such locking mechanism with unix shell scripting.
推荐答案
通常的方法是使用 flock
实用程序,它是util-linux软件包的一部分. FreeBSD和NetBSD软件包也可以使用,也可以使用其他软件包. (对于MacOSX,请参阅此问题.)
The usual way of doing this is with the flock
utility, which is part of the util-linux package. FreeBSD and NetBSD packages are also available, aiui, and probably others. (For MacOSX, see this question.)
flock
命令可以同时执行读(共享")锁和写(独占")锁.它基于flock(2)
系统调用,因此是 cooperative 锁定(也称为咨询锁定),但是在大多数应用程序中都可以正常工作(但对于文件为远程).
The flock
command can do both read ("shared") locks and write ("exclusive") locks. It is based on the flock(2)
system call, and is consequently co-operative locking (aka advisory locking), but in most applications that will work fine (but see below for the case where the file is remote).
上面链接的手册页中有一些用法示例.最简单的使用情况是
There are usage examples in the linked man page above. The simplest usage case is
flock /tmp/lockfile /usr/local/bin/do_the_update
flock /tmp/lockfile -s /usr/local/bin/do_the_rsync
都必须获得/tmp/lockfile
的锁,然后执行指定的命令(大概是shell脚本).第一个命令获得一个排他锁;我可以使用-x
选项使它明确.第二个命令获得一个共享锁.
both of obtain a lock on /tmp/lockfile
, and then execute the specified command (presumably a shell script). The first command obtains an exclusive lock; I could have made that explicit with the -x
option. The second command obtains a shared lock.
由于该问题实际上涉及对网络锁的需求,因此有必要指出,flock()
在网络文件系统上可能并不可靠.通常,目标文件应始终是本地文件.
Since the question actually involves the need for a network lock, it is necessary to point out that flock()
may not be reliable on a networked filesystem. Normally, the target file should always be local.
即使在非分布式应用程序中,您也需要考虑失败的可能性.例如,假设您正在本地同步以创建副本.如果在进行rsync的过程中主机崩溃,您将得到不完整或损坏的副本. rsync可以从中恢复,但是无法确定当主机重新启动时,rsync将在修改文件之前启动.那应该不成问题,但是您绝对需要考虑到这一点.
Even in a non-distributed application, you need to consider the possibilities of failure. Suppose you were rsync'ing locally to create a copy, for example. If the host crashes while the rsync is in process, you will end up with an incomplete or corrupt copy. rsync can recover from that, but there is no certainty that when the host restarts, the rsync will initiate before the files are modified. That shouldn't be a problem, but you definitely need to take it into account.
在分布式应用程序中,情况更加复杂,因为整个系统很少出现故障.您可以使不同的服务器或网络本身发生独立故障.
In a distributed application, the situation is more complex because the entire system rarely fails. You can have independent failure of the different servers or of the network itself.
咨询锁定不是永久性的.如果锁定文件的主机在持有锁定的情况下崩溃并重新启动,则重新启动后将不会保留该锁定.另一方面,如果持有该锁的远程服务器之一崩溃并重新启动,则可能不会意识到它正在持有该锁,在这种情况下,该锁将永远不会被释放.
Advisory locking is not persistent. If the lockfile's host crashes with the lock held and restarts, the lock will not be held after the restart. On the other hand, if one of the remote servers which holds the lock crashes and restarts, it may not be aware that it is holding the lock, in which case the lock will never be released.
如果两台服务器都100%知道彼此的状态,这不是问题,但是很难区分网络故障和主机故障.
If both servers were 100% aware of each other's state, this wouldn't be a problem, but it is very difficult to distinguish network failure from host failure.
您将需要评估风险.与本地情况一样,如果在进行rsync时文件服务器崩溃,则它可能会重新启动并立即开始修改文件.如果在文件服务器关闭时远程rsync没有失败,则它们将继续尝试同步,并且所产生的副本将被破坏.使用rsync,这应该在下一个同步周期中自行解决,但是在此期间您会遇到问题.您需要确定这有多严重.
You will need to evaluate the risks. As with the local case, if the fileserver crashes while an rsync is in progress, it may restart and immediately start modifying the files. If the remote rsync's did not fail while the fileserver was down, they will continue to attempt to synchronize and the resulting copy will be corrupt. With rsync, this should resolve itself on the next sync cycle, but in the interim you have a problem. You will need to decide how serious this is.
您可以通过使用持久性锁定来阻止文件服务器在启动时启动转换器.每个rsync服务器在启动rsync之前在主机上创建自己的锁文件(并且直到知道该文件存在之前才启动rsync),并在释放读取锁之前删除该文件.如果rsync服务器重新启动并且其指示器文件存在,则它知道rysnc期间发生了崩溃,因此它必须删除指示器文件并重新启动rsync.
You can prevent the fileserver from starting the mutator on startup by using persistent locks. Each rsync server creates its own lockfile on the host before starting the rsync (and does not start the rsync until it is known that the file exists) and deletes the file before releasing the read lock. If an rsync server restarts and its indicator file exists, it knows that there was a crash during the rysnc, so it must delete the indicator file and restart the rsync.
这在大多数情况下都可以正常工作,但是如果rsync服务器在rsync期间崩溃并且永不重启,或者仅在很长时间后重启,它可能会失败. (或者,等效地,如果网络故障长时间隔离了rsync服务器.)在这些情况下,可能有必要进行手动干预.在文件服务器上运行一个看门狗进程会很有用,它会警告操作员是否已将读取锁保持太长时间(对于太长"的定义).
This will work fine most of the time, but it can fail if an rsync server crashes during the rsync and never restarts, or restarts only after a long time. (Or, equivalently, if network failure isolates the rsync server for a long time.) In these cases, it is likely that manual intervention will be necessary. It would be useful to have a watchdog process running on the fileserver which alerts an operator if the read lock has been held for too long, for some definition of "too long".
这篇关于Bash脚本:读者写手锁定的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!