Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forking in a thread raises RuntimeError #51491

Closed
csernazs mannequin opened this issue Oct 30, 2009 · 26 comments
Closed

Forking in a thread raises RuntimeError #51491

csernazs mannequin opened this issue Oct 30, 2009 · 26 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@csernazs
Copy link
Mannequin

csernazs mannequin commented Oct 30, 2009

BPO 7242
Nosy @loewis, @Yhg1s, @gpshead, @csernazs, @pitrou, @dhellmann
Files
  • thread_test.py: A test demonstrating the issue.
  • patch_1.diff: A possible fix for this issue created for python 2.6.4.
  • patch_2.diff: Patch re-initializing import_lock on all platforms after fork().
  • thread_unittest.py: Unittest for the patch
  • 7242_unittest.diff: Unit test that tests the issue
  • issue7242-gps01.diff: update of patch_2 + thread_unittest
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2010-03-01.04:41:35.268>
    created_at = <Date 2009-10-30.10:58:01.329>
    labels = ['extension-modules', 'type-bug']
    title = 'Forking in a thread raises RuntimeError'
    updated_at = <Date 2010-12-18.16:55:12.147>
    user = 'https://github.com/csernazs'

    bugs.python.org fields:

    activity = <Date 2010-12-18.16:55:12.147>
    actor = 'Oren_Held'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2010-03-01.04:41:35.268>
    closer = 'gregory.p.smith'
    components = ['Extension Modules']
    creation = <Date 2009-10-30.10:58:01.329>
    creator = 'csernazs'
    dependencies = []
    files = ['15232', '15247', '16011', '16378', '16381', '16395']
    hgrepos = []
    issue_num = 7242
    keywords = ['patch']
    message_count = 26.0
    messages = ['94701', '94702', '94704', '94765', '94820', '94822', '94832', '95075', '98148', '98211', '98287', '98299', '98335', '99202', '100090', '100100', '100143', '100144', '100171', '100200', '100204', '100211', '100231', '100233', '100245', '124294']
    nosy_count = 9.0
    nosy_names = ['loewis', 'twouters', 'gregory.p.smith', 'csernazs', 'pitrou', 'doughellmann', 'dagobert', 'Oren_Held', 'jednaszewski']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue7242'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Oct 30, 2009

    Python 2.6.4 (r264:75706, Oct 29 2009, 12:00:12) [C] on sunos5

    On my sunos5 system if I call os.fork() in a thread created by
    thread.start_new_thread(), it raises RuntimeError with the message 'not
    holding the import lock'. It's a vanilla python compiled on sunos5 with
    suncc. In 2.6.3 it's ok, I think this issue is caused by patch located
    at http://codereview.appspot.com/96125/show.

    The problem can be re-produced by running the script attached.

    I've looked into the source and it seems to me the following:

    Based on the the change above, it calls fork1() system call between a
    lock-release calls:

    3635 » _PyImport_AcquireLock();
    3636 » pid = fork1();
    3637 » result = _PyImport_ReleaseLock();

    _PyImport_ReleaseLock is called in both the child and parent. Digging
    more into the code, _PyImport_ReleaseLock starts with the following:

        long me = PyThread_get_thread_ident();
        if (me == -1 || import_lock == NULL)
                return 0; /* Too bad */
        if (import_lock_thread != me)
                return -1;
    

    In the above code, if I interpret correctly, it compares the result of
    the current thread id returned by PyThread_get_thread_ident call with
    the thread id of the thread holding the lock - if it's different then
    the current thread cannot release the lock because it's not owned by it.

    Based on my results on solaris the 'me' variable will be different in
    the parent and in the child (in parent it remains the same) - resulting
    that the child thinks that it's not holding the release lock.

    I'm using pthreads on both linux and solaris. On linux the code above is
    working fine, but on solaris it behaves differently.

    @csernazs csernazs mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 30, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Oct 30, 2009

    Well, there has been no such change between 2.6.3 and 2.6.4.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Oct 30, 2009

    Sorry, the working version is not 2.6.3 (I mistyped the version), it's
    2.6.1 (I've no info about 2.6.2).

    @gpshead
    Copy link
    Member

    gpshead commented Oct 31, 2009

    This only appears to happen on Solaris. What version of Solaris are you
    using? (i doubt that matters, i expect it happens on all versions)

    I haven't look closely enough at the code yet, but reinitializing the
    import lock in the child process should make sense here.

    @gpshead gpshead self-assigned this Oct 31, 2009
    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Nov 2, 2009

    I've tested it only on solaris 8, 32-bit.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Nov 2, 2009

    solaris 10 x86, 32-bit, sun-studio 11 is ok (in this case the parent's
    thread has thread_id=2 and the child inherits this id)
    solaris 8 sparc4, 32-bit, sun-studio 11 is not working

    So it seems it's independent from sun-cc but depends from the
    architecture and/or the OS.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Nov 2, 2009

    I've attached a patch which seems to fix this issue. It sets
    import_lock_thread to the current thread id after forking in the child
    process, but still I'm not quite sure that it's the correct way of
    solving this issue.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Nov 9, 2009

    Additional info:
    I've tested it on solaris 10 / sparc 32-bit, and my test script runs
    fine on that.
    Based on my test it seems that this bug does not affect solaris 10.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jan 22, 2010

    Could you please provide any advise on this bug?
    I've submitted my patch, would be curious if there any chance to get it merged.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 24, 2010

    One relevant difference may be the change to the fork semantics in Solaris 10: fork() now means the same as fork1(). Before, fork() meant the same as forkall(), unless the applications was linked with -lpthread, in which case fork() also meant fork1(). See fork(2).

    So I'd be curious if a) the test case passes on Solaris 8 if fork1 is being used, and b) whether it passes if you make sure -lpthread is being used. If so, I'd rather fix this issue by changing the linkage for Solaris 8 (or declaring that system as unsupported altogether), instead of fiddling yet again with the fork handling. In Python, posix.fork definitely needs to mean "POSIX fork".

    If it's something else (i.e. the thread ID changes in Solaris 8 whether fork1 or forkall is used), then the patch is fine in principle. However, I would always reset the thread ID. In your patch, it is not clear why you apply this resetting to Solaris and AIX, but not other systems.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jan 25, 2010

    I compile it with -lpthread.
    os.fork1() was not available by default, I enabled it by removing two lines from posixmodule.c (it seems it's only enabled when #if defined(USLC) && defined(SCO_VERSION) is true).

    With os.fork1() I have the same results, RuntimeError.

    I was not able to compile it without pthread because I haven't found any configure options for that. If it's possible I'm happy to try it.

    In my patch I wanted to reduce the effect on systems where forking in thread is working (eg. linux), that's the reason why I added "(defined (__SVR4) && defined (__sun)". But it just checks for solaris, not the OS version (on solaris 10/intel my demo is working).

    (btw forking in thread actually happens in a unittest related to BaseHTTPServer, which obviously fails on my platform)

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 25, 2010

    In my patch I wanted to reduce the effect on systems where forking in
    thread is working (eg. linux), that's the reason why I added
    "(defined (__SVR4) && defined (__sun)".

    I think that's inappropriate, please change that.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Jan 26, 2010

    Ok, here's the new patch. I've removed the #ifdef-#endif lines. It passed the test thread_test.py on linux (and as well on solaris).

    @dagobert
    Copy link
    Mannequin

    dagobert mannequin commented Feb 11, 2010

    I verified patch_2.diff on Solaris 10 w/SOS11 and it actually resolves a number of issues I had with Mercurial.

    @jednaszewski
    Copy link
    Mannequin

    jednaszewski mannequin commented Feb 25, 2010

    I verified that patch_2.diff resolves the problem for me on Solaris 9/SPARC.

    @pitrou
    Copy link
    Member

    pitrou commented Feb 25, 2010

    The patch should really add an unit test for this.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Feb 26, 2010

    There's a bundled unittest in test_httpservers which actually fails if this patch is not applied.

    See the unittest attached. Unfortunatelly RuntimError is raised in the child process after fork() so I cannot re-raise it to the parent, instead I send a message from the child to the parent via a pipe if the child is ok.

    @csernazs
    Copy link
    Mannequin Author

    csernazs mannequin commented Feb 26, 2010

    I've run unittest with both patched and vanilla versions of python 2.6.4 and I confirm that it fails when it's run by vanilla on solaris 8, but passes on the patched version.

    @jednaszewski
    Copy link
    Mannequin

    jednaszewski mannequin commented Feb 26, 2010

    I spent some time working on and testing a unit test as well. It's the same basic idea as Zsolt Cserna's, but with a slightly different approach. See 7242_unittest.diff. My unittest fails pre-patch and succeeds post-patch.

    However, I still have reservations about the patch. The existing test test_threading.ThreadJoinOnShutdown.test_3_join_in_forked_from_thread hangs with the patch in place.

    Vanilla 2.6.2 - test passes
    Vanilla 2.6.4 - test fails
    Patched 2.6.4 - test hangs

    Note: the code of the test_threading test is identical in all 3 cases. I'd feel more confident about the patch if this test didn't hang with the patch in place.

    @gpshead
    Copy link
    Member

    gpshead commented Feb 28, 2010

    Using bpo-7242-gps01.diff on release26-maint and a freshly downloaded opensolaris 2009-06 VM test_thread, test_threading and test_subprocess all pass for me both before -and- after the patch. Nor does the original thread_test.py cause the problem for me (meaning I'm unable to reproduce the problem in this VM in the first place so... someone else needs to)

    Regardless, I altered patch_2 a bit in that diff to do what I though _PyImport_ReInitLock() should really do. I also added the thread_unittest testcase to that diff.

    % uname -a
    SunOS opensolaris-vm 5.11 snv 111b i86pc i386 i86pc Solaris"

    Can someone who could reproduce the problem in the first place please test that patch.

    The logic change makes sense to me. I don't know why test_threading.ThreadJoinOnShutdown.test_3_join_in_forked_from_thread would be changing behavior for you.

    @gpshead gpshead added extension-modules C modules in the Modules dir and removed stdlib Python modules in the Lib dir labels Feb 28, 2010
    @jednaszewski
    Copy link
    Mannequin

    jednaszewski mannequin commented Feb 28, 2010

    The problem only seems to appear on Solaris 9 and earlier. I'll try to test the updated patch tonight or tomorrow and let you know what I find.

    @gpshead
    Copy link
    Member

    gpshead commented Feb 28, 2010

    If you have a chance tonight that'd be awesome. I'd love to get this in before 2.6.5rc1 (being cut tomorrow) but as its platform specific (and a pretty-old platform at that) its not worth holding up the release.

    @jednaszewski
    Copy link
    Mannequin

    jednaszewski mannequin commented Mar 1, 2010

    I tested the updated patch, and the new unit test passes on my Sol 8 sparc, but the test_threading test still hangs on my system. However, given that the test is skipped on several platforms and it does work on more relevant versions of Solaris, it's probably OK. It's possible that an OS bug is causing that particular hang.

    Plus, the original patch fixed the 'real world' scenario I was running into, so I'd like to see it get into the release candidate if you're OK with it.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 1, 2010

    Agreed.

    Committed to trunk r78527. I'll wait for the buildbots before merging to release26-maint.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 1, 2010

    r78543 backports this to release26-maint, it should be in 2.6.5 assuming no show stopper issue comes up with this during the release candidates.

    @gpshead gpshead closed this as completed Mar 1, 2010
    @OrenHeld
    Copy link
    Mannequin

    OrenHeld mannequin commented Dec 18, 2010

    Just adding some info:
    This bug is not Solaris-specific; I reproduced it on HP-UX 11.31.

    On Python 2.6.4, thread_test.py fails with the same RunTime error exception.

    On Python 2.6.6, it passes and things look good.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants