Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault in _PyTrash_begin when faulthandler tries to dump thread stacks #88615

Closed
dgrisby mannequin opened this issue Jun 18, 2021 · 5 comments
Closed

Segfault in _PyTrash_begin when faulthandler tries to dump thread stacks #88615

dgrisby mannequin opened this issue Jun 18, 2021 · 5 comments
Labels
3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@dgrisby
Copy link
Mannequin

dgrisby mannequin commented Jun 18, 2021

BPO 44449
Nosy @dgrisby, @vstinner, @miss-islington
PRs
  • [3.10] bpo-44449: faulthandler don't modify frame refcnt #27850
  • [3.9] bpo-44449: faulthandler don't modify frame refcnt (GH-27850) #28066
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-08-30.14:04:32.443>
    created_at = <Date 2021-06-18.10:27:10.715>
    labels = ['3.10', 'library', '3.9', 'type-crash']
    title = 'Segfault in _PyTrash_begin when faulthandler tries to dump thread stacks'
    updated_at = <Date 2021-08-30.14:04:32.442>
    user = 'https://github.com/dgrisby'

    bugs.python.org fields:

    activity = <Date 2021-08-30.14:04:32.442>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-08-30.14:04:32.443>
    closer = 'vstinner'
    components = ['Library (Lib)']
    creation = <Date 2021-06-18.10:27:10.715>
    creator = 'dgrisby'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44449
    keywords = ['patch']
    message_count = 5.0
    messages = ['396042', '399828', '400599', '400606', '400607']
    nosy_count = 3.0
    nosy_names = ['dgrisby', 'vstinner', 'miss-islington']
    pr_nums = ['27850', '28066']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue44449'
    versions = ['Python 3.9', 'Python 3.10']

    @dgrisby
    Copy link
    Mannequin Author

    dgrisby mannequin commented Jun 18, 2021

    I am using Python 3.9.4 on CentOS 7. faulthandler is registered with SIGUSR1:

    faulthandler.register(signal.SIGUSR1)

    Sending SIGUSR1 normally correctly dumps the thread stacks, but occasionally it segfaults from the main thread instead:

    Thread 1 (Thread 0x7efe15e69740 (LWP 15201)):
    #0 _PyTrash_begin (tstate=tstate@entry=0x0, op=op@entry=0x757ece0) at Objects/object.c:2125
    #1 0x00007efe156f05e5 in frame_dealloc (f=0x757ece0) at Objects/frameobject.c:578
    #2 0x00007efe15898f88 in _Py_DECREF (op=0x757ece0) at Include/object.h:430
    #3 dump_traceback (write_header=0, tstate=0x757e1a0, fd=2) at Python/traceback.c:821
    #4 _Py_DumpTracebackThreads (fd=fd@entry=2, interp=<optimized out>, interp@entry=0x0, current_tstate=0xbe6a70) at Python/traceback.c:921
    #5 0x00007efe1590be7d in faulthandler_dump_traceback (interp=<optimized out>, all_threads=1, fd=2) at Modules/faulthandler.c:243
    #6 faulthandler_user (signum=10) at Modules/faulthandler.c:839
    #7 <signal handler called>
    #8 0x00007efe15243d2f in do_futex_wait () from /lib64/libpthread.so.0
    #9 0x00007efe15243e07 in __new_sem_wait_slow () from /lib64/libpthread.so.0
    #10 0x00007efe15243ea5 in sem_timedwait () from /lib64/libpthread.so.0
    #11 0x00007efe15896d11 in PyThread_acquire_lock_timed (lock=lock@entry=0x7ea7080, microseconds=microseconds@entry=5000000, intr_flag=intr_flag@entry=1) at Python/thread_pthread.h
    :457
    #12 0x00007efe158f35a4 in acquire_timed (timeout=5000000000, lock=0x7ea7080) at Modules/_threadmodule.c:63
    #13 lock_PyThread_acquire_lock (self=0x7efdf4518750, args=<optimized out>, kwds=<optimized out>) at Modules/_threadmodule.c:146
    #14 0x00007efe15749916 in method_vectorcall_VARARGS_KEYWORDS (func=0x7efe15e1b310, args=0x186d208, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/descrobject.c:346
    ...

    It has failed because tstate is null. tstate came from Py_TRASHCAN_BEGIN_CONDITION that calls PyThreadState_GET(), assuming it returns a valid pointer, but the comment on the _PyThreadState_GET macro says:

    Efficient macro reading directly the 'gilstate.tstate_current' atomic
    variable. The macro is unsafe: it does not check for error and it can
    return NULL.

    The only place I can see that tstate_current would be set to NULL is in _PyThreadState_DeleteCurrent(). I suspect that there has been a race with a thread exit.

    I'm not sure quite what to do about this. Perhaps faulthandler should check if tstate_current is NULL and set it suitably if so?

    @dgrisby dgrisby mannequin added 3.9 only security fixes stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump labels Jun 18, 2021
    @vstinner
    Copy link
    Member

    _Py_DumpTracebackThreads() should not use Py_DECREF(). It's a bug. It must only *read* memory, not *modify* memory since it's called from a signal handler. It's a regression in dump_traceback().

    Python 3.9 and 3.10 use:

    frame = PyThreadState_GetFrame(tstate);
    ...
    Py_DECREF(frame);

    The main branch (future 3.11) uses:

    frame = tstate->frame;

    Without Py_DECREF(): it's a borrowed reference.

    It was changed by commit ae0a2b7.

    Python 3.9 and 3.10 should be fixed to use a borrowed reference.

    @vstinner
    Copy link
    Member

    New changeset fe997e1 by Victor Stinner in branch '3.10':
    bpo-44449: faulthandler don't modify frame refcnt (GH-27850)
    fe997e1

    @miss-islington
    Copy link
    Contributor

    New changeset 720aef4 by Miss Islington (bot) in branch '3.9':
    bpo-44449: faulthandler don't modify frame refcnt (GH-27850)
    720aef4

    @vstinner
    Copy link
    Member

    The bug should now be fixed in 3.9 and 3.10 branches.

    The main branch should not be impacted, it has a different implementation.

    Thanks for the bug report Duncan Grisby.

    @vstinner vstinner added the 3.10 only security fixes label Aug 30, 2021
    @vstinner vstinner added the 3.10 only security fixes label Aug 30, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes stdlib Python modules in the Lib dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants