原文:

Why does running a background task over ssh fail if a pseudo-tty is allocated?

问题:

I've recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.

Running ssh localhost 'touch foobar &' creates a file called foobar as expected:

[bob@server ~]$ ssh localhost 'touch foobar &'
[bob@server ~]$ ls foobar
foobar

However running the same command but with the -t option to force pseudo-tty allocation fails to create foobar:

[bob@server ~]$ ssh -t localhost 'touch foobar &'
Connection to localhost closed.
[bob@server ~]$ echo $?
0
[bob@server ~]$ ls foobar
ls: cannot access foobar: No such file or directory

My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:

[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1'
Connection to localhost closed.
[bob@pidora ~]$ ls foobar
foobar

回答:

This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c, and the effect of & in input commands.

The answer assumes you're familiar with how job control works in UNIX; if you're not, here's a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word' would place the processes running cat(1), sort(1) and grep(1) in the same process group). bash is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3). A signal sent to a process group is delivered to every process in that group.

If the concept of process groups and job control is completely new to you, I think you'll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).

That being said, let's see what is happening here. We have to fit together every piece of the puzzle.

In both cases, the ssh remote side invokes bash(1) with -c. The -c flag causes bash(1) to run as a non-interactive shell. From the manpage:

Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t.

However, the use of & has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:

So, in both cases, bash will not wait for command execution, and touch(1) will be executed in the same process group as bash(1).

Now, consider what happens when a session leader exits. Quoting from setpgid(2) manpage:

When you don't use -t

When you don't use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()'d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)), there is no SIGHUP sent to the process group, because bash wasn't a session leader (and there is no controlling terminal). So everything works.

When you use -t

-t forces PTY allocation, which means that the ssh remote side will call setsid(2), allocate a pseudo-terminal + fork a new process with forkpty(3), connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1). forkpty(3) opens the PTY slave side in the forked process that will become bash; since there's no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.

Then the same thing happens again: touch(1) is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &, when it exits, SIGHUP is delivered to the process group and touch(1) dies prematurely.

About nohup

nohup(1) doesn't work here because there is still a race condition. If bash(1) terminates before nohup(1) has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)

A possible fix

Forcefully re-enabling job control fixes it. In bash, you do that with set -m. This works:

ssh -t localhost 'set -m ; touch foobar &'

Or force bash to wait for touch(1) to complete:

ssh -t localhost 'touch foobar & wait `pgrep touch`'
05-29 00:58