Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bz2file deadlock #51454

Closed
rbtcollins opened this issue Oct 25, 2009 · 12 comments
Closed

bz2file deadlock #51454

rbtcollins opened this issue Oct 25, 2009 · 12 comments
Labels
extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@rbtcollins
Copy link
Member

BPO 7205
Nosy @warsaw, @pitrou, @rbtcollins
Files
  • bz2fail.py: demo of deadlock
  • bzthreads.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-10-27.17:50:29.004>
    created_at = <Date 2009-10-25.21:55:52.434>
    labels = ['extension-modules', 'type-crash']
    title = 'bz2file deadlock'
    updated_at = <Date 2009-12-16.07:06:50.281>
    user = 'https://github.com/rbtcollins'

    bugs.python.org fields:

    activity = <Date 2009-12-16.07:06:50.281>
    actor = 'jgsack'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-10-27.17:50:29.004>
    closer = 'pitrou'
    components = ['Extension Modules']
    creation = <Date 2009-10-25.21:55:52.434>
    creator = 'rbcollins'
    dependencies = []
    files = ['15196', '15205']
    hgrepos = []
    issue_num = 7205
    keywords = ['patch']
    message_count = 12.0
    messages = ['94462', '94463', '94464', '94465', '94466', '94500', '94509', '94512', '94515', '94567', '96472', '96478']
    nosy_count = 5.0
    nosy_names = ['barry', 'jgsack', 'pitrou', 'rbcollins', 'statik']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue7205'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @rbtcollins
    Copy link
    Member Author

    There is a systemic bug in BZ2File where the GIL is released to perform
    compression work, and any other thread calling into BZ2File will
    deadlock. We noticed in the write method, but inspection of the code
    makes it clear that its systemic.

    The problem is pretty simple. Say you have two threads and one bz2file
    object. One calls write(), the other calls (say) seek(), but it could be
    write() or other methods too. Now, its pretty clear that the question
    'should these two threads get serialised' could be contentious. So lets
    put that aside by saying 'raising an exception or serialising in
    arbitrary order would be ok'.

    What happens today is:
    t1:bz2file.write
    bz2file.lock.acquire
    gil-release
    bz2compression starts
    t2:gil-acquired
    bz2file.seek
    bz2file.lock.acquire(wait=1) <- this thread is stuck now, and has
    the GIL
    t1:bz2compression finishes
    gil.acquire <- this thread is stuck now, waiting for the GIL

    If any owner of the bz2file object lock will release the GIL, *all*
    routines that attempt to lock the bz2file object have to release the GIL
    if they can't get the lock - blocking won't work. I'm not familiar
    enough with the python threading API to know whether its safe to call
    without the GIL. If its not then clearly it can't be used to work with
    getting the GIL, and lower layer locks should be used.

    @rbtcollins rbtcollins added extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 25, 2009
    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2009

    Thanks, nice catch.

    @rbtcollins
    Copy link
    Member Author

    On Sun, 2009-10-25 at 22:00 +0000, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Thanks, nice catch.

    Yeah :).

    versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2

    Python 2.5 is also affected - its what we're running on the server that
    broke :)

    -Rob

    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2009

    Python 2.5 is also affected - its what we're running on the server that
    broke :)

    Yes, but it doesn't receive any bug fixes anymore -- only security
    fixes.

    @rbtcollins
    Copy link
    Member Author

    On Sun, 2009-10-25 at 22:27 +0000, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    > Python 2.5 is also affected - its what we're running on the server that
    > broke :)

    Yes, but it doesn't receive any bug fixes anymore -- only security
    fixes.

    Ok, we'll work around the issue there then ;)

    -Rob

    @pitrou
    Copy link
    Member

    pitrou commented Oct 26, 2009

    Here is a patch.

    @rbtcollins
    Copy link
    Member Author

    On Mon, 2009-10-26 at 19:23 +0000, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Here is a patch.

    Looks fine to me assuming that the locking functions can be used outside
    the GIL.

    On the test side, the case I supplied was low noise for me - I'd
    hesitate to do as much compression as you are (50 times more) unless you
    saw it spuriously pass a lot - the nature of the deadlock isn't
    dependent on races so much as simple scheduling - as long as the seocnd
    thread is scheduled before the first thread completes compressing the
    deadlock will occur.

    -Rob

    @pitrou
    Copy link
    Member

    pitrou commented Oct 26, 2009

    Looks fine to me assuming that the locking functions can be used outside
    the GIL.

    Yes, they can. Actually, even the GIL uses them. :-)

    On the test side, the case I supplied was low noise for me - I'd
    hesitate to do as much compression as you are (50 times more) unless you
    saw it spuriously pass a lot - the nature of the deadlock isn't
    dependent on races so much as simple scheduling - as long as the seocnd
    thread is scheduled before the first thread completes compressing the
    deadlock will occur.

    Well, your test case often succeeded here, so I decided on a more
    aggressive variation.

    @rbtcollins
    Copy link
    Member Author

    On Mon, 2009-10-26 at 21:27 +0000, Antoine Pitrou wrote:

    Antoine Pitrou <pitrou@free.fr> added the comment:

    Well, your test case often succeeded here, so I decided on a more
    aggressive variation.

    fair enough, if its needed - its needed :)

    -Rob

    @pitrou
    Copy link
    Member

    pitrou commented Oct 27, 2009

    Fixed in r75818 (trunk), r75819 (2.6), r75820 (py3k), r75821 (3.1). Thanks!

    @pitrou pitrou closed this as completed Oct 27, 2009
    @jgsack
    Copy link
    Mannequin

    jgsack mannequin commented Dec 16, 2009

    On my Fedora 11 AMD x86_64 system, it appears the deadlock still occurs
    (up to the limit of my patience, ie: several minutes). If I reduce to say
    3, I can get the test to succeed sometimes.

    Observed in: trunk
    -r 76850 (latest)
    -r 75818 (first occurrence of testThreading in test-bz2.py)

    but not -r 75817

    ~jim

    @jgsack
    Copy link
    Mannequin

    jgsack mannequin commented Dec 16, 2009

    IMPORTANT Correction: Please disregard msg96472.

    I was forgetting to do .configure and make, and evidently getting bogus
    failures.

    test_bz2 works fine now, ..sorry for the false alarm.

    ~jim

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants