在USR1信号后可靠地终止睡眠过程

本文介绍了在USR1信号后可靠地终止睡眠过程的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在编写一个Shell脚本，该脚本在从另一个进程接收到USR1信号时定期执行任务.

I am writing a shell script which performs a task periodically and on receiving a USR1 signal from another process.

脚本的结构类似于此答案:

#!/bin/bash

trap 'echo "doing some work"' SIGUSR1

while :
do
    sleep 10 && echo "doing some work" &
    wait $!
done

但是，此脚本的问题是睡眠过程在后台继续，并且仅在其超时时终止.(请注意，当在等待$！期间接收到USR1时，睡眠过程会延迟其常规超时，但确实会取消定期回声.)例如，您可以使用 pkill -0查看计算机上的睡眠过程数.-c sleep .

However, this script has the problem that the sleep process continues in the background and only dies on its timeout. (note that when USR1 is received during wait $!, the sleep process lingers for its regular timeout, but the periodic echo indeed gets cancelled.) You can for example see the number of sleep processes on your machine using pkill -0 -c sleep.

我阅读了此页面，该文章建议在陷阱操作中消除持久的睡眠，例如

I read this page, which suggests killing the lingering sleep in the trap action, e.g.

#!/bin/bash

pid=
trap '[[ $pid ]] && kill $pid; echo "doing some work"' SIGUSR1

while :
do
    sleep 10 && echo "doing some work" &
    pid=$!
    wait $pid
    pid=
done

但是，如果我们快速将USR1信号作为垃圾邮件，则此脚本具有竞争条件，例如与:

However this script has a race condition if we spam our USR1 signal fast e.g. with:

pkill -USR1 trap-test.sh; pkill -USR1 trap-test.sh

然后它将尝试杀死已经被杀死的PID并显示错误.更不用说，我不喜欢这段代码.

then it will try to kill a PID which was already killed and print an error. Not to mention, I do not like this code.

是否有更好的方法来可靠地杀死被中断的分叉进程?还是实现相同功能的替代结构?

Is there a better way to reliably kill the forked process when interrupted? Or an alternative structure to achieve the same functionality?

推荐答案

由于后台作业是前台作业的一个分支，因此它们共享相同的名称( trap-test.sh )；因此 pkill 会同时匹配并发出信号.这以不确定的顺序杀死了后台进程(使 sleep 保持活动，如下所述)并触发了前一个陷阱，从而导致了竞争状况.

As the background job is a fork of the foreground one, they share the same name (trap-test.sh); so pkill matches and signals both. This, in an uncertain order, kills the background process (leaving sleep alive, explained below) and triggers the trap in the foreground one, hence the race condition.

此外，在您链接的示例中，后台作业始终仅是 sleep x ，但是在您的脚本中是 sleep 10&&回声做一些工作" ；这需要分叉的子外壳等待 sleep 终止并有条件地执行 echo .比较这两个:

Besides, in the examples you linked, the background job is always a mere sleep x, but in your script it is sleep 10 && echo 'doing some work'; which requires the forked subshell to wait sleep to terminate and conditionally execute echo. Compare these two:

$ sleep 10 &
[1] 9401
$ pstree 9401
sleep
$
$ sleep 10 && echo foo &
[2] 9410
$ pstree 9410
bash───sleep

因此，让我们从头开始，并在终端中重现主要问题.

So let's start from scratch and reproduce the principal issue in a terminal.

$ set +m
$ sleep 100 && echo 'doing some work' &
[1] 9923
$ pstree -pg $$
bash(9871,9871)─┬─bash(9923,9871)───sleep(9924,9871)
                └─pstree(9927,9871)
$ kill $!
$ pgrep sleep
9924
$ pkill -e sleep
sleep killed (pid 9924)

我禁用了作业控制，以部分模拟非交互式shell的行为.

I disabled job control to partly emulate a non-interactive shell's behavior.

杀死后台作业并没有杀死 sleep ，我需要手动终止它.发生这种情况是因为发送给进程的信号不会自动广播给目标的子进程.即 sleep 根本没有收到TERM信号.

Killing the background job didn't kill sleep, I needed to terminate it manually. This happened because a signal sent to a process is not automatically broadcast to its target's children; i.e. sleep didn't receive the TERM signal at all.

要杀死 sleep 以及子外壳，我需要将后台作业放入一个单独的进程组-这需要启用作业控制，否则所有作业如上面的 pstree 的输出所示，将它们放入主外壳程序的进程组中，并向其发送TERM信号，如下所示.

To kill sleep as well as the subshell, I need to put the background job into a separate process group —which requires job control to be enabled, otherwise all jobs are put into the main shell's process group as seen in pstree's output above—, and send the TERM signal to it, as shown below.


$ set -m
$ sleep 100 && echo 'doing some work' &
[1] 10058
$ pstree -pg $$
bash(9871,9871)─┬─bash(10058,10058)───sleep(10059,10058)
                └─pstree(10067,10067)
$ kill -- -$!
$
[1]+  Terminated              sleep 100 && echo 'doing some work'
$ pgrep sleep
$

对该概念进行一些改进和调整后，您的脚本如下所示:

With some refinement and adaptation of this concept, your script looks like:

#!/bin/bash -
set -m

usr1_handler() {
  kill -- -$!
  echo 'doing some work'
}

do_something() {
  trap '' USR1
  sleep 10 && echo 'doing some work'
}

trap usr1_handler USR1 EXIT

echo "my PID is $$"

while true; do
  do_something &
  wait
done

这将打印我的PID是xxx (其中 xxx 是前台进程的PID)并开始循环.向 xxx (即 kill -USR1 xxx )发送USR1信号将触发陷阱，并导致后台进程及其子进程终止.因此， wait 将返回并且循环将继续.

This will print my PID is xxx (where xxx is the PID of foreground process) and start looping. Sending a USR1 signal to xxx (i.e kill -USR1 xxx) will trigger the trap and cause the background process and its children to terminate. Thus wait will return and the loop will continue.

如果使用 pkill 代替，它将仍然起作用，因为后台进程会忽略USR1.

If you use pkill instead it'll work anyway, as the background process ignores USR1.

有关更多信息，请参见:

For further information, see:

Bash参考手册§特殊参数( $$ 和 $！)，
POSIX kill 规范(-$！用法)，
POSIX定义§作业控制(如何在POSIX中实现作业控制贝壳)，
Bash参考手册§作业控制基础知识(作业控制在bash中的工作方式)，
POSIX Shell命令语言§信号和错误处理，
POSIX wait 规范.

Bash Reference Manual § Special Parameters ($$ and $!),
POSIX kill specification (-$! usage),
POSIX Definitions § Job Control (how job control is implemented in POSIX shells),
Bash Reference Manual § Job Control Basics (how job control works in bash),
POSIX Shell Command Language § Signals And Error Handling,
POSIX wait specification.

这篇关于在USR1信号后可靠地终止睡眠过程的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！