Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signals can be caught by any thread #80782

Closed
jdemeyer opened this issue Apr 11, 2019 · 9 comments
Closed

signals can be caught by any thread #80782

jdemeyer opened this issue Apr 11, 2019 · 9 comments
Assignees
Labels
3.8 only security fixes type-feature A feature request or enhancement

Comments

@jdemeyer
Copy link
Contributor

BPO 36601
Nosy @gvanrossum, @gpshead, @vstinner, @njsmith, @jdemeyer, @eryksun, @zooba, @miss-islington
PRs
  • bpo-36601: clarify signal handler comment and remove unnecessary pid check. #12784
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2019-05-10.16:34:05.237>
    created_at = <Date 2019-04-11.10:40:13.014>
    labels = ['type-feature', '3.8']
    title = 'signals can be caught by any thread'
    updated_at = <Date 2019-05-10.16:34:05.231>
    user = 'https://github.com/jdemeyer'

    bugs.python.org fields:

    activity = <Date 2019-05-10.16:34:05.231>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2019-05-10.16:34:05.237>
    closer = 'gregory.p.smith'
    components = []
    creation = <Date 2019-04-11.10:40:13.014>
    creator = 'jdemeyer'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36601
    keywords = ['patch']
    message_count = 9.0
    messages = ['339958', '339961', '340074', '340081', '340085', '340098', '340142', '342026', '342033']
    nosy_count = 9.0
    nosy_names = ['gvanrossum', 'gregory.p.smith', 'Rhamphoryncus', 'vstinner', 'njs', 'jdemeyer', 'eryksun', 'steve.dower', 'miss-islington']
    pr_nums = ['12784']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'commit review'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue36601'
    versions = ['Python 3.8']

    @jdemeyer
    Copy link
    Contributor Author

    Because of some discussion that is happening on bpo-1583 I noticed this bit of code in the OS-level signal handler (set by the C function sigaction() or signal()):

    static void
    signal_handler(int sig_num)
    {
    /* See NOTES section above */
    if (getpid() == main_pid)
    {
    trip_signal(sig_num);
    }

    The check getpid() == main_pid is claimed to check for the main *thread* but in fact it's checking the process ID, which is the same for all threads. So as far as I can tell, this condition is always true.

    This code essentially goes back to 1994 (commit bb4ba12), so it may have been true at that time that threads were implemented as processes and that getpid() returned a different value for different threads.

    Note that this code refers to receiving a signal from the OS. In Python, it's always handled (by the function registered by signal.signal) by the main thread. But the current behaviour actually makes sense, so we should just remove the superfluous check and fix the comments in the code.

    @jdemeyer
    Copy link
    Contributor Author

    Also note that the documentation of the signal module already has the correct wording:

    Python signal handlers are always executed in the main Python thread, even if the signal was received in another thread.

    @zooba
    Copy link
    Member

    zooba commented Apr 12, 2019

    Looks like Guido added the original code about 25 years ago. Since he removed himself from the other thread during that discussion, I'm going to assume he's not interested in thinking about it any more.

    As the original comment says, it's a hack, but I guess there may have been an OS around at the time that would deliver signals across processes? Maybe after fork? It might be worth pinging python-dev briefly just to check if anyone there knows of an OS that might do this.

    That said, the comment change in the PR looks totally fine to me. I'm just hesitant to remove something that's apparently been working for a quarter of a century ;)

    @gvanrossum
    Copy link
    Member

    IIRC in SGI, getpid() would return the thread ID. They had a syscall that could create a new subprocess that would share or not share various resources (e.g. memory, signals, file descriptors) so by setting or clearing bits you could implement a continuum of variations between fork and thread creation (for thread creation you'd share everything). Maybe a similar thing exists in Linux? But IRIX is dead so I think it's safe to kill.

    @eryksun
    Copy link
    Contributor

    eryksun commented Apr 12, 2019

    Maybe a similar thing exists in Linux?

    Back in the late 90s, Linux implemented threads as 'processes' (LinuxThreads), but with shared resources such as virtual memory and file descriptors. (The Linux kernel's clone system call is highly composable in this regard.) Thus getpid() was different for each thread in a process and kill() could target a particular thread-process. I guess it was a problem if Ctrl+C in a terminal would send SIGINT to every thread-process associated with the terminal.

    Eventually, for scalability and POSIX compliance, Linux abandoned the LinuxThreads implementation. It evolved kernel process IDs into thread IDs and a new concept called a thread group emerged. Nowadays all threads in a process are in the same thread group, and the PID returned by getpid() is the thread-group ID (TGID), which is the thread ID (TID) of the first thread in the process. clone() defaults to creating a new process (thread group), unless CLONE_THREAD is specified.

    @gpshead
    Copy link
    Member

    gpshead commented Apr 12, 2019

    We can remove ancient Irix and LinuxThreads hacks from our codebase.

    The best way to shake issues of this sort out is to remove it and watch for issues on supported buildbots and during beta releases. I don't expect any fallout from this one.

    Other places this check _could_ have come up without us realizing it... if a signal handler setup persisted into a quasi-forked process (vfork) this check would prevent that oddball dangerous state of a child process from doing signal processing until after _PySignal_AfterFork was called to reset main_pid.

    but trip_signal() is already safe in crazy async signal safe contexts so I don't see a problem there. if it happened to write to a wakeup.fd that was shared by the parent and child even that shouldn't be an issue. A vfork()ed process is about to exec or die anyways (no other actions are supported) - never go back to running Python code.

    @njsmith
    Copy link
    Contributor

    njsmith commented Apr 13, 2019

    Yeah, the check makes sense on a system like the comment says IRIX used to be, where getpid() actually returns a thread id *and* signals are delivered to all threads (wtf). But any modern system should do the POSIX thing, where getpid() returns the same value in all threads, and signals are only delivered to one thread anyway. So I agree with everyone else that the original motivation doesn't make sense any more.

    The place where this would change things is in fork/vfork context. For fork it seems unhelpful... like Greg notes, we already reset main_pid in the AfterFork handler, so the only effect is that there's a brief moment where signals can be lost. If we removed the (getpid() == main_thread) check, then fork() would work slightly better.

    For vfork, man, I don't even know. Here I do disagree with Greg a little – according to POSIX, trip_signal is *not* safe to call in a vfork'ed process. POSIX says: "behavior is undefined if the process created by vfork() either modifies any data [...] or calls any other function". Yes this is really as extreme as it sounds – you're not allowed to mutate data or use any syscalls. And they explicitly note that this applies to signal handlers too: "If signal handlers are invoked in the child process after vfork(), they must follow the same rules as other code in the child process."

    trip_signals sets a flag -> it mutates data -> it technically can invoke undefined behavior if it's called in a child process after a vfork. And it can call write(), which again, undefined behavior.

    Of course this is so restrictive that vfork() is almost unusable in Python anyway, because you can't do anything in Python without modifying memory.

    And worse: according to a strict reading of POSIX, vfork() should call pthread_atfork() handlers, and our pthread_atfork() handlers mutate memory.

    So from the POSIX-rules-lawyer perspective, there's absolutely no way any Python process can ever call vfork() without invoking undefined behavior, no matter what we do here.

    Do we care?

    It looks like subprocess.py was recently modified to call posix_spawn in some cases:

    cpython/Lib/subprocess.py

    Lines 610 to 654 in a304b13

    def _use_posix_spawn():
    """Check if posix_spawn() can be used for subprocess.
    subprocess requires a posix_spawn() implementation that properly reports
    errors to the parent process, & sets errno on the following failures:
    * Process attribute actions failed.
    * File actions failed.
    * exec() failed.
    Prefer an implementation which can use vfork() in some cases for best
    performance.
    """
    if _mswindows or not hasattr(os, 'posix_spawn'):
    # os.posix_spawn() is not available
    return False
    if sys.platform == 'darwin':
    # posix_spawn() is a syscall on macOS and properly reports errors
    return True
    # Check libc name and runtime libc version
    try:
    ver = os.confstr('CS_GNU_LIBC_VERSION')
    # parse 'glibc 2.28' as ('glibc', (2, 28))
    parts = ver.split(maxsplit=1)
    if len(parts) != 2:
    # reject unknown format
    raise ValueError
    libc = parts[0]
    version = tuple(map(int, parts[1].split('.')))
    if sys.platform == 'linux' and libc == 'glibc' and version >= (2, 24):
    # glibc 2.24 has a new Linux posix_spawn implementation using vfork
    # which properly reports errors to the parent process.
    return True
    # Note: Don't use the implementation in earlier glibc because it doesn't
    # use vfork (even if glibc 2.26 added a pipe to properly report errors
    # to the parent process).
    except (AttributeError, ValueError, OSError):
    # os.confstr() or CS_GNU_LIBC_VERSION value not available
    pass
    # By default, assume that posix_spawn() does not properly report errors.
    return False

    If we believe the comments in that function, it only does this on macOS – where posix_spawn is a native syscall, so no vfork is involved – or on systems using glibc (i.e., Linux), where posix_spawn *does* invoke vfork(). So in this one case, currently, CPython does use vfork(). Also, users might call os.posix_spawn directly on any system.

    However, checking the glibc source, their implementation of posix_spawn actually takes care of this – it doesn't use vfork() directly, but rather clone(), and it takes care to make sure that no signal handlers run in the child (see sysdeps/unix/sysv/linux/spawni.c for details).

    AFAICT, the only ways that someone could potentially get themselves into trouble with vfork() on CPython are:

    • by explicitly wrapping it themselves, in which case, good luck to them I guess. On Linux they aren't instantly doomed, because Linux intentionally deviates from POSIX and *doesn't* invoke pthread_atfork handlers after vfork(), but they still have their work cut out for them. Not really our problem.

    • by explicitly calling os.posix_spawn on some system where this is implemented using vfork internally, but with a broken libc that doesn't handle signals or pthread_atfork correctly. Currently we don't know of any such systems, and if they do exist they have to be pretty rare.

    So: I think we shouldn't worry about vfork(), and it's fine to remove the (getpid() == main_pid) check.

    @gpshead gpshead self-assigned this May 10, 2019
    @gpshead
    Copy link
    Member

    gpshead commented May 10, 2019

    PR is approved with automerge, just awaiting CI to give it the green light now.

    @miss-islington
    Copy link
    Contributor

    New changeset d237b3f by Miss Islington (bot) (Jeroen Demeyer) in branch 'master':
    bpo-36601: clarify signal handler comment and remove unnecessary pid check. (GH-12784)
    d237b3f

    @gpshead gpshead added the 3.8 only security fixes label May 10, 2019
    @gpshead gpshead closed this as completed May 10, 2019
    @gpshead gpshead added the type-feature A feature request or enhancement label May 10, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants