classification
Title: bz2file deadlock
Type: crash Stage: needs patch
Components: Extension Modules Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, jgsack, pitrou, rbcollins, statik
Priority: high Keywords: patch

Created on 2009-10-25 21:55 by rbcollins, last changed 2009-12-16 07:06 by jgsack. This issue is now closed.

Files
File name Uploaded Description Edit
bz2fail.py rbcollins, 2009-10-25 21:55 demo of deadlock
bzthreads.patch pitrou, 2009-10-26 19:23
Messages (12)
msg94462 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2009-10-25 21:55
There is a systemic bug in BZ2File where the GIL is released to perform
compression work, and any other thread calling into BZ2File will
deadlock. We noticed in the write method, but inspection of the code
makes it clear that its systemic.

The problem is pretty simple. Say you have two threads and one bz2file
object. One calls write(), the other calls (say) seek(), but it could be
write() or other methods too. Now, its pretty clear that the question
'should these two threads get serialised' could be contentious. So lets
put that aside by saying 'raising an exception or serialising in
arbitrary order would be ok'.

What happens today is:
t1:bz2file.write
   bz2file.lock.acquire
   gil-release
   bz2compression starts
t2:gil-acquired
   bz2file.seek
   bz2file.lock.acquire(wait=1)  <- this thread is stuck now, and has
the GIL
t1:bz2compression finishes
   gil.acquire <- this thread is stuck now, waiting for the GIL

If any owner of the bz2file object lock will release the GIL, *all*
routines that attempt to lock the bz2file object have to release the GIL
if they can't get the lock - blocking won't work. I'm not familiar
enough with the python threading API to know whether its safe to call
without the GIL. If its not then clearly it can't be used to work with
getting the GIL, and lower layer locks should be used.
msg94463 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-25 22:00
Thanks, nice catch.
msg94464 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2009-10-25 22:15
On Sun, 2009-10-25 at 22:00 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> Thanks, nice catch.

Yeah :).

> versions: +Python 2.6, Python 2.7, Python 3.1, Python 3.2

Python 2.5 is also affected - its what we're running on the server that
broke :) 

-Rob
msg94465 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-25 22:27
> Python 2.5 is also affected - its what we're running on the server that
> broke :) 

Yes, but it doesn't receive any bug fixes anymore -- only security
fixes.
msg94466 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2009-10-25 22:34
On Sun, 2009-10-25 at 22:27 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> > Python 2.5 is also affected - its what we're running on the server that
> > broke :) 
> 
> Yes, but it doesn't receive any bug fixes anymore -- only security
> fixes.

Ok, we'll work around the issue there then ;)

-Rob
msg94500 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-26 19:23
Here is a patch.
msg94509 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2009-10-26 21:15
On Mon, 2009-10-26 at 19:23 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:
> 
> Here is a patch.

Looks fine to me assuming that the locking functions can be used outside
the GIL.

On the test side, the case I supplied was low noise for me - I'd
hesitate to do as much compression as you are (50 times more) unless you
saw it spuriously pass a lot - the nature of the deadlock isn't
dependent on races so much as simple scheduling - as long as the seocnd
thread is scheduled before the first thread completes compressing the
deadlock will occur.

-Rob
msg94512 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-26 21:27
> Looks fine to me assuming that the locking functions can be used outside
> the GIL.

Yes, they can. Actually, even the GIL uses them. :-)

> On the test side, the case I supplied was low noise for me - I'd
> hesitate to do as much compression as you are (50 times more) unless you
> saw it spuriously pass a lot - the nature of the deadlock isn't
> dependent on races so much as simple scheduling - as long as the seocnd
> thread is scheduled before the first thread completes compressing the
> deadlock will occur.

Well, your test case often succeeded here, so I decided on a more
aggressive variation.
msg94515 - (view) Author: Robert Collins (rbcollins) * (Python committer) Date: 2009-10-26 21:33
On Mon, 2009-10-26 at 21:27 +0000, Antoine Pitrou wrote:
> Antoine Pitrou <pitrou@free.fr> added the comment:

> Well, your test case often succeeded here, so I decided on a more
> aggressive variation.

fair enough, if its needed - its needed :)

-Rob
msg94567 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-10-27 17:50
Fixed in r75818 (trunk), r75819 (2.6), r75820 (py3k), r75821 (3.1). Thanks!
msg96472 - (view) Author: James G. sack (jim) (jgsack) Date: 2009-12-16 02:55
On my Fedora 11 AMD x86_64 system, it appears the deadlock still occurs 
(up to the limit of my patience, ie: several minutes). If I reduce to say 
3, I can get the test to succeed sometimes.

Observed in: trunk 
  -r 76850 (latest)
  -r 75818 (first occurrence of testThreading in test-bz2.py)

but not -r 75817

~jim
msg96478 - (view) Author: James G. sack (jim) (jgsack) Date: 2009-12-16 07:06
IMPORTANT Correction: Please disregard msg96472. 

I was forgetting to do .configure and make, and evidently getting bogus 
failures.

test_bz2 works fine now, ..sorry for the false alarm.

~jim
History
Date User Action Args
2009-12-16 07:06:50jgsacksetmessages: + msg96478
2009-12-16 02:55:01jgsacksetnosy: + jgsack
messages: + msg96472
2009-10-27 17:50:29pitrousetstatus: open -> closed
resolution: fixed
messages: + msg94567
2009-10-26 21:33:17rbcollinssetmessages: + msg94515
2009-10-26 21:27:30pitrousetmessages: + msg94512
2009-10-26 21:15:13rbcollinssetmessages: + msg94509
2009-10-26 19:23:08pitrousetfiles: + bzthreads.patch
keywords: + patch
messages: + msg94500
2009-10-25 22:34:16rbcollinssetmessages: + msg94466
2009-10-25 22:27:09pitrousetmessages: + msg94465
2009-10-25 22:15:03rbcollinssetmessages: + msg94464
2009-10-25 22:00:16pitrousetpriority: high
versions: + Python 2.6, Python 3.1, Python 2.7, Python 3.2
nosy: + pitrou

messages: + msg94463

stage: needs patch
2009-10-25 21:55:53rbcollinscreate