classification
Title: signals can be caught by any thread
Type: enhancement Stage: commit review
Components: Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Rhamphoryncus, eryksun, gregory.p.smith, gvanrossum, jdemeyer, miss-islington, njs, steve.dower, vstinner
Priority: normal Keywords: patch

Created on 2019-04-11 10:40 by jdemeyer, last changed 2019-05-10 16:34 by gregory.p.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12784 merged jdemeyer, 2019-04-11 11:00
Messages (9)
msg339958 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-04-11 10:40
Because of some discussion that is happening on #1583 I noticed this bit of code in the OS-level signal handler (set by the C function sigaction() or signal()):

static void
signal_handler(int sig_num)
{
/* See NOTES section above */
if (getpid() == main_pid)
{
trip_signal(sig_num);
}

The check getpid() == main_pid is claimed to check for the main *thread* but in fact it's checking the process ID, which is the same for all threads. So as far as I can tell, this condition is always true.

This code essentially goes back to 1994 (commit bb4ba12242), so it may have been true at that time that threads were implemented as processes and that getpid() returned a different value for different threads.

Note that this code refers to receiving a signal from the OS. In Python, it's always handled (by the function registered by signal.signal) by the main thread. But the current behaviour actually makes sense, so we should just remove the superfluous check and fix the comments in the code.
msg339961 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-04-11 11:01
Also note that the documentation of the signal module already has the correct wording:

Python signal handlers are always executed in the main Python thread, even if the signal was received in another thread.
msg340074 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-12 15:59
Looks like Guido added the original code about 25 years ago. Since he removed himself from the other thread during that discussion, I'm going to assume he's not interested in thinking about it any more.

As the original comment says, it's a hack, but I guess there may have been an OS around at the time that would deliver signals across processes? Maybe after fork? It might be worth pinging python-dev briefly just to check if anyone there knows of an OS that might do this.

That said, the comment change in the PR looks totally fine to me. I'm just hesitant to remove something that's apparently been working for a quarter of a century ;)
msg340081 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2019-04-12 16:35
IIRC in SGI, getpid() would return the thread ID. They had a syscall that could create a new subprocess that would share or not share various resources (e.g. memory, signals, file descriptors) so by setting or clearing bits you could implement a continuum of variations between fork and thread creation (for thread creation you'd share everything). Maybe a similar thing exists in Linux? But IRIX is dead so I think it's safe to kill.
msg340085 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-04-12 17:17
> Maybe a similar thing exists in Linux? 

Back in the late 90s, Linux implemented threads as 'processes' (LinuxThreads), but with shared resources such as virtual memory and file descriptors. (The Linux kernel's clone system call is highly composable in this regard.) Thus getpid() was different for each thread in a process and kill() could target a particular thread-process. I guess it was a problem if Ctrl+C in a terminal would send SIGINT to every thread-process associated with the terminal.

Eventually, for scalability and POSIX compliance, Linux abandoned the LinuxThreads implementation. It evolved kernel process IDs into thread IDs and a new concept called a thread group emerged. Nowadays all threads in a process are in the same thread group, and the PID returned by getpid() is the thread-group ID (TGID), which is the thread ID (TID) of the first thread in the process. clone() defaults to creating a new process (thread group), unless CLONE_THREAD is specified.
msg340098 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-12 18:52
We can remove ancient Irix and LinuxThreads hacks from our codebase.

The best way to shake issues of this sort out is to remove it and watch for issues on supported buildbots and during beta releases.  I don't expect any fallout from this one.

Other places this check _could_ have come up without us realizing it... if a signal handler setup persisted into a quasi-forked process (vfork) this check would prevent that oddball dangerous state of a child process from doing signal processing until after _PySignal_AfterFork was called to reset main_pid.

but trip_signal() is already safe in crazy async signal safe contexts so I don't see a problem there.  if it happened to write to a wakeup.fd that was shared by the parent and child even that shouldn't be an issue.  A vfork()ed process is about to exec or die anyways (no other actions are supported) - never go back to running Python code.
msg340142 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2019-04-13 01:35
Yeah, the check makes sense on a system like the comment says IRIX used to be, where getpid() actually returns a thread id *and* signals are delivered to all threads (wtf). But any modern system should do the POSIX thing, where getpid() returns the same value in all threads, and signals are only delivered to one thread anyway. So I agree with everyone else that the original motivation doesn't make sense any more.

The place where this would change things is in fork/vfork context. For fork it seems unhelpful... like Greg notes, we already reset main_pid in the AfterFork handler, so the only effect is that there's a brief moment where signals can be lost. If we removed the (getpid() == main_thread) check, then fork() would work slightly better.

For vfork, man, I don't even know. Here I do disagree with Greg a little – according to POSIX, trip_signal is *not* safe to call in a vfork'ed process. POSIX says: "behavior is undefined if the process created by vfork() either modifies any data [...] or calls any other function". Yes this is really as extreme as it sounds – you're not allowed to mutate data or use any syscalls. And they explicitly note that this applies to signal handlers too: "If signal handlers are invoked in the child process after vfork(), they must follow the same rules as other code in the child process."

trip_signals sets a flag -> it mutates data -> it technically can invoke undefined behavior if it's called in a child process after a vfork. And it can call write(), which again, undefined behavior.

Of course this is so restrictive that vfork() is almost unusable in Python anyway, because you can't do anything in Python without modifying memory.

And worse: according to a strict reading of POSIX, vfork() should call pthread_atfork() handlers, and our pthread_atfork() handlers mutate memory.

So from the POSIX-rules-lawyer perspective, there's absolutely no way any Python process can ever call vfork() without invoking undefined behavior, no matter what we do here.

Do we care?

It looks like subprocess.py was recently modified to call posix_spawn in some cases: https://github.com/python/cpython/blob/a304b136adda3575898d8b5debedcd48d5072272/Lib/subprocess.py#L610-L654

If we believe the comments in that function, it only does this on macOS – where posix_spawn is a native syscall, so no vfork is involved – or on systems using glibc (i.e., Linux), where posix_spawn *does* invoke vfork(). So in this one case, currently, CPython does use vfork(). Also, users might call os.posix_spawn directly on any system.

However, checking the glibc source, their implementation of posix_spawn actually takes care of this – it doesn't use vfork() directly, but rather clone(), and it takes care to make sure that no signal handlers run in the child (see sysdeps/unix/sysv/linux/spawni.c for details).

AFAICT, the only ways that someone could potentially get themselves into trouble with vfork() on CPython are:

- by explicitly wrapping it themselves, in which case, good luck to them I guess. On Linux they aren't instantly doomed, because Linux intentionally deviates from POSIX and *doesn't* invoke pthread_atfork handlers after vfork(), but they still have their work cut out for them. Not really our problem.

- by explicitly calling os.posix_spawn on some system where this is implemented using vfork internally, but with a broken libc that doesn't handle signals or pthread_atfork correctly. Currently we don't know of any such systems, and if they do exist they have to be pretty rare.

So: I think we shouldn't worry about vfork(), and it's fine to remove the (getpid() == main_pid) check.
msg342026 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-05-10 01:17
PR is approved with automerge, just awaiting CI to give it the green light now.
msg342033 - (view) Author: miss-islington (miss-islington) Date: 2019-05-10 01:28
New changeset d237b3f0f61990c972b84c45eb4fe137db51a6a7 by Miss Islington (bot) (Jeroen Demeyer) in branch 'master':
bpo-36601: clarify signal handler comment and remove unnecessary pid check. (GH-12784)
https://github.com/python/cpython/commit/d237b3f0f61990c972b84c45eb4fe137db51a6a7
History
Date User Action Args
2019-05-10 16:34:05gregory.p.smithsetstatus: open -> closed
type: enhancement
stage: patch review -> commit review
resolution: fixed
versions: + Python 3.8
2019-05-10 01:28:59miss-islingtonsetnosy: + miss-islington
messages: + msg342033
2019-05-10 01:17:21gregory.p.smithsetmessages: + msg342026
2019-05-10 01:17:02gregory.p.smithsetassignee: gregory.p.smith
2019-04-13 01:36:00njssetnosy: + njs
messages: + msg340142
2019-04-12 18:52:29gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg340098
2019-04-12 17:17:09eryksunsetnosy: + eryksun
messages: + msg340085
2019-04-12 16:35:03gvanrossumsetnosy: + gvanrossum
messages: + msg340081
2019-04-12 16:21:06vstinnersetnosy: + vstinner
2019-04-12 15:59:11steve.dowersetnosy: + steve.dower
messages: + msg340074
2019-04-11 11:01:40jdemeyersetmessages: + msg339961
2019-04-11 11:00:35jdemeyersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12711
2019-04-11 10:40:13jdemeyercreate