classification
Title: multiprocessing occasionally spits out exception during shutdown (_handle_workers)
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jnoller Nosy List: Albert.Strasheim, asksol, davin, gdb, gkcn, jnoller, mattgwwalker, pnasrat
Priority: normal Keywords:

Created on 2010-07-08 20:05 by gdb, last changed 2015-02-11 18:50 by davin. This issue is now closed.

Messages (23)
msg109588 - (view) Author: Greg Brockman (gdb) Date: 2010-07-08 20:05
On Ubuntu 10.04, using freshly-compiled python-from-trunk (as well as multiprocessing-from-trunk), I get tracebacks from the following about 30% of the time:

"""
import multiprocessing, time           
def foo(x):                            
 time.sleep(3)                         
multiprocessing.Pool(1).apply(foo, [1])
"""

My tracebacks are of the form:
"""
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 530, in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 483, in run
  File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 272, in _handle_workers
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
"""

This information was originally posted to http://bugs.python.org/issue4106.
msg109591 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-08 20:26
Thanks greg; so this affects 2.6 as well (not using the backport at all)
msg109598 - (view) Author: Greg Brockman (gdb) Date: 2010-07-08 20:42
That's likely a mistake on my part.  I'm not observing this using the stock version of multiprocessing on my Ubuntu machine(after running O(100) times).  I do, however, observe it when using either python2.7 or python2.6 with multiprocessing-from-trunk, if that's interesting.

I'm not really sure what the convention is here; should this be filed just under Python 2.7?  Thanks.
msg109601 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-08 20:58
Oh, you mean the backport from google code?

The person who stepped up to maintain that has not refreshed that in some time. I need to decide what to do with it long term. I'm pretty sure it's badly out of date.
msg109615 - (view) Author: Greg Brockman (gdb) Date: 2010-07-08 21:33
No, I'm not using the Google code backport.

To be clear, I've tried testing this with two versions of multiprocessing:
- multiprocessing-from-trunk (r82645): I get these exceptions with ~40% frequency
- multiprocessing from Ubuntu 10.04 (version 0.70a1): No such exceptions observed

Out of curiosity, I did just try this with the processing library (version 0.52) on a 64-bit Debian Lenny box, and did not observe these exceptions.

Hope that's useful!
msg109632 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-08 22:41
Wait - so, you are pulling svn trunk, compiling and running your test with the built python executable? I'm not following the "multiprocessing-from-trunk" distinction unless you're picking the module out of the tree / compiling it and then moving it into some other install. I might be being overly dense.

You're running your test with cd src/tree/ && ./python <your thing> - right?

Also, what, if any, compile flags are you passing to the python build?
msg109643 - (view) Author: Greg Brockman (gdb) Date: 2010-07-08 23:12
> Wait - so, you are pulling svn trunk, compiling and running your test 
> with the built python executable?
Yes.  I initially observed this issue while using 10.04's Python (2.6.5), but wanted to make sure it wasn't fixed by using a newer interpreter.

> I'm not following the "multiprocessing-from-trunk" distinction unless > you're picking the module out of the tree / compiling it and then 
> moving it into some other install. I might be being overly dense.
Initially I was doing exactly that.  (Some context: I was working on a patch to fix a different multiprocessing issue, and figured I may as well write my patch against the most recent version of the library.)  Note that I was using Lucid's _multiprocessing, so there was no compilation involved.

> You're running your test with cd src/tree/ && ./python <your thing> - > right?
What... is src/tree?  If it's what you're asking, I am running the freshly-compiled python interpreter, and it does seem to be using the relevant modules out of trunk:
>>> import threading; threading.__file__
'/usr/local/lib/python2.7/threading.pyc'
>>> import multiprocessing; multiprocessing.__file__
'/usr/local/lib/python2.7/multiprocessing/__init__.pyc'
>>> import _multiprocessing; _multiprocessing.__file__
'/usr/local/lib/python2.7/lib-dynload/_multiprocessing.so'

When running with 2.6, all modules are whatever's available for 10.04 except for the multiprocessing that I took from trunk:
>>> import threading; threading.__file__
'/usr/lib/python2.6/threading.pyc'
>>> import multiprocessing; multiprocessing.__file__
'multiprocessing/__init__.pyc'
>>> import _multiprocessing; _multiprocessing.__file__
'/usr/lib/python2.6/lib-dynload/_multiprocessing.so'

> Also, what, if any, compile flags are you passing to the python build?
I just ran ./configure && make && make install

Sorry about the confusion--let me know if you'd like additional information.  I can test on other platforms/with other configurations if it would be useful.
msg109645 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-08 23:27
Alright, I'm fighting ubuntu 64 bit in my vmware install right now, I'll see if I can get it up and running.
msg109663 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 01:58
I can confirm with a clean ubuntu 64 install, with a clean checkout of release27 that it explodes with that exception, while the stock 2.6.5 does not.
msg109665 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 02:03
It does not seem to appear on OS/X 10.6.4 - so the only question is does this show up on Ubuntu 32bit
msg109666 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 02:12
Correction; it can and does happen on OS/X. So, this is not a platform specific bug.
msg109667 - (view) Author: Greg Brockman (gdb) Date: 2010-07-09 02:26
Yeah, I've just taken a checkout from trunk, ran './configure && make && make install', and reproduced on:

- Ubuntu 10.04 32-bit
- Ubuntu 9.04 32-bit
msg109668 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 02:31
Greg, can you comment out line 272 in Lib/multiprocessing/pool.py and tell me if you can reproduce?
msg109669 - (view) Author: Greg Brockman (gdb) Date: 2010-07-09 02:53
With the line commented out, I no longer see any exceptions.

Although, if I understand what's going on, there still a (much rarer) possibility of an exception, right?  I guess in the common case, the worker_handler is in the sleep when shutdown begins.  But if it happens to be in the in the _maintain_pool step, would you still get these exceptions?
msg109711 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 11:13
I'm not sure if there would still be the possibility; the thing which worries me is the debug() function vanishing on us - something not good is happening on interpreter shutdown.
msg109736 - (view) Author: Greg Brockman (gdb) Date: 2010-07-09 13:41
Think http://www.mail-archive.com/python-list@python.org/msg282114.html is relevant?
msg109737 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-09 13:47
Greg - yeah. it's the same problem.
msg109866 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-10 12:46
Talking with Brett; the fix should be as simple as keeping a reference to the debug function which we have in the imports. During interpreter shutdown, the sys.modules is iterated and each module replaced with None. Since the _handle_workers thread persists slightly past the point of the parent (and can, it's a daemon thread) debug is vanishing on us.

We can go with switching this to a classmethod, and keeping a reference on the class, passing debug directly into the _handle_workers thread (testing this last night fixed it 100% of the time)
msg110115 - (view) Author: Greg Brockman (gdb) Date: 2010-07-12 17:45
With pool.py:272 commented out, running about 50k iterations, I saw 4 tracebacks giving an exception on pool.py:152.  So this seems to imply the race does exist (i.e. that the thread is in _maintain_pool rather than time.sleep when shutdown begins).  It looks like the _maintain_pool run takes O(10^-4)s, so it's not surprising the error is so rare.

That being said, the patch I submitted in issue 9205 should handle this case as well.
msg110116 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2010-07-12 18:02
Thank you for doing that footwork Greg, it means a lot to me. I'm leaning towards the patch to swallow the errors - I just wanted to ponder it just a tiny bit longer before I pull the trigger.
msg127929 - (view) Author: Matthew Walker (mattgwwalker) Date: 2011-02-04 19:57
It looks to me as if this issue has already been pretty much sorted out already.  Maybe all it lacks is to be officially closed, but just in case I just wanted to add that I too saw this bug (stock python 2.7, Ubuntu 10.04 64 bit).

My example code was:

#!/usr/bin/env python

import multiprocessing
import os
import time

def f(i):
    print "I am process number",os.getpid(),": i =",i
    time.sleep(10)
    return i*i

pool = multiprocessing.Pool(maxtasksperchild=1)

print pool.map(f, range(10))
msg127930 - (view) Author: Matthew Walker (mattgwwalker) Date: 2011-02-04 20:01
Oh, and the stack trace was identical to Greg's:

$ ./test.py 
I am process number 10378 : i = 0
[...]
I am process number 10390 : i = 9
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/gel/usr/mawal32/system/lib/python2.7/threading.py", line 530, in __bootstrap_inner
  File "/gel/usr/mawal32/system/lib/python2.7/threading.py", line 483, in run
  File "/gel/usr/mawal32/system/lib/python2.7/multiprocessing/pool.py", line 272, in _handle_workers
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
msg235765 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2015-02-11 18:50
Closing this issue after having verified that the issue can no longer be reproduced on the systems mentioned (Ubuntu 10.04 or OSX).  Related issues such as issue9205 have been addressed elsewhere and other possibly related issues such as issue22393 are being tracked separately.  It appears Jesse's patches in 2010 (though not explicitly added to this issue here) have indeed addressed this issue.


Using OS X 10.10:
Using Greg's original test script (from his first post), I ran the script 100 times (via shell script loop) but was unable to provoke the issue on OS X 10.10 using either the Anaconda build of Python 2.7.9 or a local build from the default (3.5.0a1+) branch.

Using Matt Walker's supplied test script (from a much later post here), I ran his script 100 times in tight succession but was unable to provoke the issue on OS X 10.10 using either the Anaconda build of Python 2.7.9 or a local build from the default (3.5.0a1+) branch.  Side note:  running Matt's test script kinda takes a while ("time.sleep(10)" seriously?), especially at 100 iterations.



Using Ubuntu 10.04 (64-bit):
I installed Ubuntu 10.04.4 64-bit Server in a VM and repeated the above test sequences with those two scripts using the stock Python 2.6.5, locally built 2.7 branch, and locally built default (3.5) branch versions.  Summary:
* At 100 iterations I was still unable to provoke the issue with Greg's script using 2.6.5, 2.7.9+, or 3.5.0a1+.
* Matt's script leverages functionality only introduced in 2.7 and so could not be run as-is with 2.6.5.
* At 100 iterations I was still unable to provoke the issue with Matt's script using 2.7.9+ or 3.5.0a1+.



Even if it's a little late in saying:  thanks goes to the reporters, Greg and Matt, and to Jesse for implementing the fix.
History
Date User Action Args
2015-02-11 18:50:38davinsetstatus: open -> closed

nosy: + davin
messages: + msg235765

resolution: fixed
stage: resolved
2012-01-30 06:02:38pnasratsetnosy: + pnasrat
2011-04-27 14:19:41gkcnsetnosy: + gkcn
2011-02-04 20:01:16mattgwwalkersetnosy: jnoller, asksol, gdb, Albert.Strasheim, mattgwwalker
messages: + msg127930
2011-02-04 19:57:13mattgwwalkersetnosy: + mattgwwalker
messages: + msg127929
2010-08-27 05:13:58Albert.Strasheimsetnosy: + Albert.Strasheim
2010-07-12 18:02:21jnollersetmessages: + msg110116
2010-07-12 17:45:12gdbsetmessages: + msg110115
2010-07-10 12:46:53jnollersetmessages: + msg109866
2010-07-09 14:08:56asksolsetnosy: + asksol
2010-07-09 13:47:20jnollersetmessages: + msg109737
2010-07-09 13:41:47gdbsetmessages: + msg109736
2010-07-09 11:13:45jnollersetmessages: + msg109711
2010-07-09 02:53:00gdbsetmessages: + msg109669
2010-07-09 02:31:46jnollersetmessages: + msg109668
2010-07-09 02:26:13gdbsetmessages: + msg109667
2010-07-09 02:12:05jnollersetmessages: + msg109666
2010-07-09 02:03:54jnollersetmessages: + msg109665
2010-07-09 01:58:36jnollersetmessages: + msg109663
2010-07-08 23:27:50jnollersetmessages: + msg109645
2010-07-08 23:12:22gdbsetmessages: + msg109643
2010-07-08 22:41:48jnollersetmessages: + msg109632
2010-07-08 21:33:16gdbsetmessages: + msg109615
2010-07-08 20:58:15jnollersetmessages: + msg109601
2010-07-08 20:42:51gdbsetmessages: + msg109598
2010-07-08 20:32:18jnollersettitle: multiprocessing occasionally spits out exception during shutdown -> multiprocessing occasionally spits out exception during shutdown (_handle_workers)
2010-07-08 20:26:29jnollersetassignee: jnoller

messages: + msg109591
nosy: + jnoller
2010-07-08 20:05:32gdbcreate