Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw away more radioactive locks that could be held across a fork in threading.py #50892

Closed
rnk mannequin opened this issue Aug 4, 2009 · 14 comments
Closed

Throw away more radioactive locks that could be held across a fork in threading.py #50892

rnk mannequin opened this issue Aug 4, 2009 · 14 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker stdlib Python modules in the Lib dir

Comments

@rnk
Copy link
Mannequin

rnk mannequin commented Aug 4, 2009

BPO 6643
Nosy @warsaw, @gpshead, @rnk
Files
  • forkjoindeadlock.py: Failing fork/threads test case.
  • forkdeadlock.diff: Patch to fix the deadlock
  • thread-fork-join.diff: Updated patch
  • thread-fork-join.diff: clear condition waiters also
  • issue6643-release26_maint_gps01.diff
  • test_thread.diff: Patch to fix AttributeError in test_thread
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2011-01-04.18:44:17.436>
    created_at = <Date 2009-08-04.18:56:48.860>
    labels = ['interpreter-core', 'library', 'release-blocker']
    title = 'Throw away more radioactive locks that could be held across a fork in threading.py'
    updated_at = <Date 2013-07-30.10:54:40.366>
    user = 'https://github.com/rnk'

    bugs.python.org fields:

    activity = <Date 2013-07-30.10:54:40.366>
    actor = 'automatthias'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2011-01-04.18:44:17.436>
    closer = 'gregory.p.smith'
    components = ['Interpreter Core', 'Library (Lib)']
    creation = <Date 2009-08-04.18:56:48.860>
    creator = 'rnk'
    dependencies = []
    files = ['14647', '14648', '17932', '17936', '20249', '20259']
    hgrepos = []
    issue_num = 6643
    keywords = ['patch', 'needs review']
    message_count = 14.0
    messages = ['91265', '91273', '109914', '109933', '110071', '110092', '125236', '125240', '125270', '125273', '125338', '125346', '125350', '193923']
    nosy_count = 8.0
    nosy_names = ['barry', 'collinwinter', 'gregory.p.smith', 'Rhamphoryncus', 'jyasskin', 'nadeem.vawda', 'rnk', 'automatthias']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue6643'
    versions = ['Python 2.6']

    @rnk
    Copy link
    Mannequin Author

    rnk mannequin commented Aug 4, 2009

    This bug is similar to the importlock deadlock, and it's really part of
    a larger problem that you should release all locks before you fork.
    However, we can fix this in the threading module directly by freeing and
    resetting the locks on the main thread after a fork.

    I've attached a test case that inserts calls to sleep at the right
    places to make the following occur:

    • Main thread spawns a worker thread.
    • Main thread joins worker thread.
    • To join, the main thread acquires the lock on the condition variable
      (worker.__block.acquire()).
      == switch to worker ==
    • Worker thread forks.
      == switch to child process ==
    • Worker thread, which is now the only thread in the process, returns.
    • __bootstrap_inner calls self.__stop() to notify any other threads
      waiting for it that it returned.
    • __stop() tries to acquire self.__block, which has been left in an
      acquired state, so the child process hangs here.
      == switch to worker in parent process ==
    • Worker thread calls os.waitpid(), which hangs, since the child never
      returns.

    So there's the deadlock.

    I think I should be able to fix it just by resetting the condition
    variable lock and any other locks hanging off the only thread left
    standing after the fork.

    @rnk rnk mannequin added the stdlib Python modules in the Lib dir label Aug 4, 2009
    @rnk
    Copy link
    Mannequin Author

    rnk mannequin commented Aug 4, 2009

    Here's a patch for 3.2 which adds the fix and a test case. I also
    verified that the problem exists in 3.1, 2.7, and 2.6 and backported the
    patch to those versions, but someone should review this one before I
    upload those.

    @collinwinter collinwinter mannequin added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 11, 2009
    @rnk
    Copy link
    Mannequin Author

    rnk mannequin commented Jul 10, 2010

    Here's an updated patch for py3k (3.2). The test still fails without the fix, and passes with the fix.

    Thinking more about this, I'll try summarizing the bug more coherently:

    When the main thread joins the child threads, it acquires some locks. If a fork in a child thread occurs while those locks are held, they remain locked in the child process. My solution is to do here what we do elsewhere in CPython: abandon radioactive locks and allocate fresh ones.

    @rnk
    Copy link
    Mannequin Author

    rnk mannequin commented Jul 10, 2010

    I realized that in a later fix for unladen-swallow, we also cleared the condition variable waiters list, since it has radioactive synchronization primitives in it as well.

    Here's an updated patch that simplifies the fix by just using __init__() to completely reinitialize the condition variables and adds a test.

    This corresponds to unladen-swallow revisions r799 and r834.

    @rnk rnk mannequin changed the title joining a child that forks can deadlock in the forked child process Throw away more radioactive locks that could be held across a fork in threading.py Jul 11, 2010
    @Rhamphoryncus
    Copy link
    Mannequin

    Rhamphoryncus mannequin commented Jul 12, 2010

    I don't have any direct opinions on this, as it is just a bandaid. fork, as defined by POSIX, doesn't allow what we do with it, so we're reliant on great deal of OS and library implementation details. The only portable and robust solution would be to replace it with a unified fork-and-exec API that's implemented directly in C.

    @rnk
    Copy link
    Mannequin Author

    rnk mannequin commented Jul 12, 2010

    I completely agree, but the cat is out of the bag on this one. I don't see how we could get rid of fork until Py4K, and even then I'm sure there will be people who don't want to see it go, and I'd rather not spend my time arguing this point.

    The only application of fork that doesn't use exec that I've heard of is pre-forked Python servers. But those don't seem like they would be very useful, since with refcounting the copy-on-write behavior doesn't get you very many wins.

    The problem that this bandaid solves for me is that test_threading.py already tests thread+fork behaviors, and can fail non-deterministically.

    This problem was exacerbated while I was working on making the compilation thread.

    I don't think we can un-support fork and threads in the near future either, because subprocess.py uses fork, and libraries can use fork behind the user's back.

    @rnk rnk mannequin self-assigned this Jul 18, 2010
    @gpshead
    Copy link
    Member

    gpshead commented Jan 3, 2011

    fwiw a unified fork-and-exec API implemented in C is what I added in Modules/_posixsubprocess.c to at least avoid this issue as much as possible when using subprocess.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 3, 2011

    patch looks good. committed in r87710 for 3.2. needs back porting to 3.1 and 2.7 and optionally 2.6.

    @gpshead gpshead assigned gpshead and unassigned rnk Jan 3, 2011
    @gpshead
    Copy link
    Member

    gpshead commented Jan 4, 2011

    r87726 for release31-maint
    r87727 for release27-maint - this required a bit more fiddling as _block and _started and _cond were __ private.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 4, 2011

    Attached is a patch for Python 2.6 release26_maint for reference incase someone wants it. That branch is closed - security fixes only.

    @gpshead gpshead closed this as completed Jan 4, 2011
    @nadeemvawda
    Copy link
    Mannequin

    nadeemvawda mannequin commented Jan 4, 2011

    r87710 introduces an AttributeError in test_thread's TestForkInThread test case. If os.fork() is called from a thread created by the _thread module, threading._after_fork() will get a _DummyThread (with no _block attribute) as the current thread.

    I've attached a patch that checks whether the thread has a _block attribute before trying to reinitialize it.

    @pitrou pitrou reopened this Jan 4, 2011
    @gpshead
    Copy link
    Member

    gpshead commented Jan 4, 2011

    eek, thanks for noticing that!

    r87740 fixes this in py3k. backporting to 3.1 and 2.7 now.

    @gpshead
    Copy link
    Member

    gpshead commented Jan 4, 2011

    r87741 3.1
    r87742 2.7

    @gpshead gpshead closed this as completed Jan 4, 2011
    @automatthias
    Copy link
    Mannequin

    automatthias mannequin commented Jul 30, 2013

    Python version: 2.7.5
    OS: Solaris 9

    I'm still observing this issue (or bpo-5114) on Solaris 9. The symptom is that test_threading hangs indefinitely (tested: overnight) and running pstack on the process, I'm seeing:

    ----------------- lwp# 1 / thread# 1 --------------------
    ff3dc734 lwp_park (0, 0, 0)
    ff3d3c74 s9_lwp_park (0, 0, 0, 1, feed4f48, 18f5a4) + 28
    ff3dc698 s9_handler (0, 0, 0, 1, feed4f48, 18f5a4) + 90
    ff1dea70 _sema_wait (0, feee66a0, fed6b054, feee6000, 2a298478, d1f20) + 1d4
    ff1dec30 sema_wait (81aa8, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c
    feed4f48 sem_wait (81aa8, 0, fed6b1ac, 0, 0, 1) + 20
    ff050890 PyThread_acquire_lock (81aa8, 1, fed6b214, 2, 0, 1ae778) + 5c
    ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 16a298, 55150) + 50
    fefa779c PyCFunction_Call (1ae788, 22030, 0, ff0d8eb8, 55150, ff0551fc) + e4
    ff016b14 PyEval_EvalFrameEx (18f5a0, 0, 0, d4f66, 16a298, 22030) + 5ee8
    ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18f5a0, 4, 1, 18f5a4) + 924
    ff0168f8 PyEval_EvalFrameEx (1902b8, 0, 1, 1765c0, 16a298, 1b12d0) + 5ccc
    ff0185d0 PyEval_EvalCodeEx (13f608, 0, 1902b8, 4, 1, 1902bc) + 924
    ff0168f8 PyEval_EvalFrameEx (154748, 0, 1, 31f7f, 16a298, 1b1250) + 5ccc
    ff0185d0 PyEval_EvalCodeEx (10d650, 54a50, 154748, 2203c, 0, 2203c) + 924
    fef8e11c function_call (22038, 22030, 1386f0, 2203c, 130730, 22030) + 168
    fef604e8 PyObject_Call (130730, 22030, 1386f0, ff0e0340, fef8dfb4, 0) + 60
    ff0137dc PyEval_EvalFrameEx (169110, 0, 22030, 10e62d, 16a298, 22030) + 2bb0
    ff017478 PyEval_EvalFrameEx (168f80, 0, 169114, 1769fa, 16a298, 16a298) + 684c
    ff017478 PyEval_EvalFrameEx (176cb0, 0, 168f84, 12a2c0, 16a298, 16a298) + 684c
    ff0185d0 PyEval_EvalCodeEx (13f410, 176cb4, 176cb0, 13433c, 1, 0) + 924
    fef8e040 function_call (1b26f0, 134330, 0, ff1bc000, 1b26f0, 0) + 8c
    fef604e8 PyObject_Call (1b26f0, 134330, 0, ff0e0340, fef8dfb4, 134320) + 60
    fef6e530 instancemethod_call (0, 134330, 0, 0, 1b26f0, 134bd0) + a4
    fef604e8 PyObject_Call (c3b48, 22030, 0, ff0e0340, fef6e48c, 0) + 60
    ff01051c PyEval_CallObjectWithKeywords (c3b48, 22030, 0, 0, 0, 0) + 68
    ff05568c t_bootstrap (63bd0, 0, 0, 0, 16a298, ff0e2804) + 4c
    ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0)
    ----------------- lwp# 2 / thread# 2 --------------------
    ff3dc734 lwp_park (0, 0, 0)
    ff3d3c74 s9_lwp_park (0, 0, 0, 1, b64a0d58, 136818) + 28
    ff3dc698 s9_handler (0, 0, 0, 1, b64a0d58, 136818) + 90
    ff1dea70 _sema_wait (0, feee66a0, fec6b054, feee6000, 2a298478, d1f20) + 1d4
    ff1dec30 sema_wait (8ab00, ff1dec24, 722a5b4b, 1101c, feed4f48, 134d60) + c
    feed4f48 sem_wait (8ab00, 0, fec6b1ac, 0, 0, 1) + 20
    ff050890 PyThread_acquire_lock (8ab00, 1, fec6b214, 2, 0, 1ae610) + 5c
    ff05524c lock_PyThread_acquire_lock (0, 22030, 0, 13ee40, 156168, 55160) + 50
    fefa779c PyCFunction_Call (1ae620, 22030, 0, ff0d8eb8, 55160, ff0551fc) + e4
    ff016b14 PyEval_EvalFrameEx (18fe60, 0, 0, d4f66, 156168, 22030) + 5ee8
    ff0185d0 PyEval_EvalCodeEx (12c968, 0, 18fe60, 4, 1, 18fe64) + 924
    ff0168f8 PyEval_EvalFrameEx (18fce8, 0, 1, 1765c0, 156168, 1b11b0) + 5ccc
    ff0185d0 PyEval_EvalCodeEx (13f608, 0, 18fce8, 4, 1, 18fcec) + 924
    ff0168f8 PyEval_EvalFrameEx (18fb88, 0, 1, 136155, 156168, 1a2930) + 5ccc
    ff0185d0 PyEval_EvalCodeEx (48b60, 18fb8c, 18fb88, 19d41c, 1, 2203c) + 924
    fef8e11c function_call (22038, 19d410, 1b3c00, 2203c, 130370, 22030) + 168
    fef604e8 PyObject_Call (130370, 19d410, 1b3c00, ff0e0340, fef8dfb4, 19d400) + 60
    ff0137dc PyEval_EvalFrameEx (18fa20, 0, 19d410, 10e62d, 156168, 134950) + 2bb0
    ff017478 PyEval_EvalFrameEx (18f890, 0, 18fa24, 1769fa, 156168, 156168) + 684c
    ff017478 PyEval_EvalFrameEx (18f728, 0, 18f894, 12a2c0, 156168, 156168) + 684c
    ff0185d0 PyEval_EvalCodeEx (13f410, 18f72c, 18f728, 19d3fc, 1, 0) + 924
    fef8e040 function_call (1b26f0, 19d3f0, 0, ff1bc000, 1b26f0, 0) + 8c
    fef604e8 PyObject_Call (1b26f0, 19d3f0, 0, ff0e0340, fef8dfb4, 19d3e0) + 60
    fef6e530 instancemethod_call (0, 19d3f0, 0, 0, 1b26f0, 1b1250) + a4
    fef604e8 PyObject_Call (1aeaf8, 22030, 0, ff0e0340, fef6e48c, 0) + 60
    ff01051c PyEval_CallObjectWithKeywords (1aeaf8, 22030, 0, 0, 0, 0) + 68
    ff05568c t_bootstrap (63c30, 0, 0, 0, 156168, ff0e2804) + 4c
    ff1e53a4 _lwp_start (0, 0, 0, 0, 0, 0)

    The problem does not occur on Solaris 10.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants