Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiprocessing occasionally spits out exception during shutdown (_handle_workers) #53453

Closed
gdb mannequin opened this issue Jul 8, 2010 · 23 comments
Closed

multiprocessing occasionally spits out exception during shutdown (_handle_workers) #53453

gdb mannequin opened this issue Jul 8, 2010 · 23 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gdb
Copy link
Mannequin

gdb mannequin commented Jul 8, 2010

BPO 9207
Nosy @applio

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2015-02-11.18:50:38.376>
created_at = <Date 2010-07-08.20:05:32.399>
labels = ['type-bug', 'library']
title = 'multiprocessing occasionally spits out exception during shutdown (_handle_workers)'
updated_at = <Date 2015-02-11.18:50:38.374>
user = 'https://bugs.python.org/gdb'

bugs.python.org fields:

activity = <Date 2015-02-11.18:50:38.374>
actor = 'davin'
assignee = 'jnoller'
closed = True
closed_date = <Date 2015-02-11.18:50:38.376>
closer = 'davin'
components = ['Library (Lib)']
creation = <Date 2010-07-08.20:05:32.399>
creator = 'gdb'
dependencies = []
files = []
hgrepos = []
issue_num = 9207
keywords = []
message_count = 23.0
messages = ['109588', '109591', '109598', '109601', '109615', '109632', '109643', '109645', '109663', '109665', '109666', '109667', '109668', '109669', '109711', '109736', '109737', '109866', '110115', '110116', '127929', '127930', '235765']
nosy_count = 8.0
nosy_names = ['jnoller', 'pnasrat', 'asksol', 'gdb', 'Albert.Strasheim', 'mattgwwalker', 'gkcn', 'davin']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue9207'
versions = ['Python 2.6', 'Python 2.7']

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 8, 2010

On Ubuntu 10.04, using freshly-compiled python-from-trunk (as well as multiprocessing-from-trunk), I get tracebacks from the following about 30% of the time:

"""
import multiprocessing, time
def foo(x):
time.sleep(3)
multiprocessing.Pool(1).apply(foo, [1])
"""

My tracebacks are of the form:
"""
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/threading.py", line 530, in __bootstrap_inner
  File "/usr/local/lib/python2.7/threading.py", line 483, in run
  File "/usr/local/lib/python2.7/multiprocessing/pool.py", line 272, in _handle_workers
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
"""

This information was originally posted to http://bugs.python.org/issue4106.

@gdb gdb mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Jul 8, 2010
@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 8, 2010

Thanks greg; so this affects 2.6 as well (not using the backport at all)

@jnoller jnoller mannequin self-assigned this Jul 8, 2010
@jnoller jnoller mannequin changed the title multiprocessing occasionally spits out exception during shutdown multiprocessing occasionally spits out exception during shutdown (_handle_workers) Jul 8, 2010
@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 8, 2010

That's likely a mistake on my part. I'm not observing this using the stock version of multiprocessing on my Ubuntu machine(after running O(100) times). I do, however, observe it when using either python2.7 or python2.6 with multiprocessing-from-trunk, if that's interesting.

I'm not really sure what the convention is here; should this be filed just under Python 2.7? Thanks.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 8, 2010

Oh, you mean the backport from google code?

The person who stepped up to maintain that has not refreshed that in some time. I need to decide what to do with it long term. I'm pretty sure it's badly out of date.

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 8, 2010

No, I'm not using the Google code backport.

To be clear, I've tried testing this with two versions of multiprocessing:

  • multiprocessing-from-trunk (r82645): I get these exceptions with ~40% frequency
  • multiprocessing from Ubuntu 10.04 (version 0.70a1): No such exceptions observed

Out of curiosity, I did just try this with the processing library (version 0.52) on a 64-bit Debian Lenny box, and did not observe these exceptions.

Hope that's useful!

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 8, 2010

Wait - so, you are pulling svn trunk, compiling and running your test with the built python executable? I'm not following the "multiprocessing-from-trunk" distinction unless you're picking the module out of the tree / compiling it and then moving it into some other install. I might be being overly dense.

You're running your test with cd src/tree/ && ./python <your thing> - right?

Also, what, if any, compile flags are you passing to the python build?

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 8, 2010

Wait - so, you are pulling svn trunk, compiling and running your test
with the built python executable?
Yes. I initially observed this issue while using 10.04's Python (2.6.5), but wanted to make sure it wasn't fixed by using a newer interpreter.

I'm not following the "multiprocessing-from-trunk" distinction unless > you're picking the module out of the tree / compiling it and then
moving it into some other install. I might be being overly dense.
Initially I was doing exactly that. (Some context: I was working on a patch to fix a different multiprocessing issue, and figured I may as well write my patch against the most recent version of the library.) Note that I was using Lucid's _multiprocessing, so there was no compilation involved.

> You're running your test with cd src/tree/ && ./python <your thing> - > right?
What... is src/tree?  If it's what you're asking, I am running the freshly-compiled python interpreter, and it does seem to be using the relevant modules out of trunk:
>>> import threading; threading.__file__
'/usr/local/lib/python2.7/threading.pyc'
>>> import multiprocessing; multiprocessing.__file__
'/usr/local/lib/python2.7/multiprocessing/__init__.pyc'
>>> import _multiprocessing; _multiprocessing.__file__
'/usr/local/lib/python2.7/lib-dynload/_multiprocessing.so'

When running with 2.6, all modules are whatever's available for 10.04 except for the multiprocessing that I took from trunk:
>>> import threading; threading.__file__
'/usr/lib/python2.6/threading.pyc'
>>> import multiprocessing; multiprocessing.__file__
'multiprocessing/__init__.pyc'
>>> import _multiprocessing; _multiprocessing.__file__
'/usr/lib/python2.6/lib-dynload/_multiprocessing.so'

Also, what, if any, compile flags are you passing to the python build?
I just ran ./configure && make && make install

Sorry about the confusion--let me know if you'd like additional information. I can test on other platforms/with other configurations if it would be useful.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 8, 2010

Alright, I'm fighting ubuntu 64 bit in my vmware install right now, I'll see if I can get it up and running.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

I can confirm with a clean ubuntu 64 install, with a clean checkout of release27 that it explodes with that exception, while the stock 2.6.5 does not.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

It does not seem to appear on OS/X 10.6.4 - so the only question is does this show up on Ubuntu 32bit

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

Correction; it can and does happen on OS/X. So, this is not a platform specific bug.

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 9, 2010

Yeah, I've just taken a checkout from trunk, ran './configure && make && make install', and reproduced on:

  • Ubuntu 10.04 32-bit
  • Ubuntu 9.04 32-bit

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

Greg, can you comment out line 272 in Lib/multiprocessing/pool.py and tell me if you can reproduce?

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 9, 2010

With the line commented out, I no longer see any exceptions.

Although, if I understand what's going on, there still a (much rarer) possibility of an exception, right? I guess in the common case, the worker_handler is in the sleep when shutdown begins. But if it happens to be in the in the _maintain_pool step, would you still get these exceptions?

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

I'm not sure if there would still be the possibility; the thing which worries me is the debug() function vanishing on us - something not good is happening on interpreter shutdown.

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 9, 2010

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 9, 2010

Greg - yeah. it's the same problem.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 10, 2010

Talking with Brett; the fix should be as simple as keeping a reference to the debug function which we have in the imports. During interpreter shutdown, the sys.modules is iterated and each module replaced with None. Since the _handle_workers thread persists slightly past the point of the parent (and can, it's a daemon thread) debug is vanishing on us.

We can go with switching this to a classmethod, and keeping a reference on the class, passing debug directly into the _handle_workers thread (testing this last night fixed it 100% of the time)

@gdb
Copy link
Mannequin Author

gdb mannequin commented Jul 12, 2010

With pool.py:272 commented out, running about 50k iterations, I saw 4 tracebacks giving an exception on pool.py:152. So this seems to imply the race does exist (i.e. that the thread is in _maintain_pool rather than time.sleep when shutdown begins). It looks like the _maintain_pool run takes O(10^-4)s, so it's not surprising the error is so rare.

That being said, the patch I submitted in bpo-9205 should handle this case as well.

@jnoller
Copy link
Mannequin

jnoller mannequin commented Jul 12, 2010

Thank you for doing that footwork Greg, it means a lot to me. I'm leaning towards the patch to swallow the errors - I just wanted to ponder it just a tiny bit longer before I pull the trigger.

@mattgwwalker
Copy link
Mannequin

mattgwwalker mannequin commented Feb 4, 2011

It looks to me as if this issue has already been pretty much sorted out already. Maybe all it lacks is to be officially closed, but just in case I just wanted to add that I too saw this bug (stock python 2.7, Ubuntu 10.04 64 bit).

My example code was:

#!/usr/bin/env python

import multiprocessing
import os
import time

def f(i):
    print "I am process number",os.getpid(),": i =",i
    time.sleep(10)
    return i*i

pool = multiprocessing.Pool(maxtasksperchild=1)

print pool.map(f, range(10))

@mattgwwalker
Copy link
Mannequin

mattgwwalker mannequin commented Feb 4, 2011

Oh, and the stack trace was identical to Greg's:

$ ./test.py 
I am process number 10378 : i = 0
[...]
I am process number 10390 : i = 9
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Exception in thread Thread-1 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/gel/usr/mawal32/system/lib/python2.7/threading.py", line 530, in __bootstrap_inner
  File "/gel/usr/mawal32/system/lib/python2.7/threading.py", line 483, in run
  File "/gel/usr/mawal32/system/lib/python2.7/multiprocessing/pool.py", line 272, in _handle_workers
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

@applio
Copy link
Member

applio commented Feb 11, 2015

Closing this issue after having verified that the issue can no longer be reproduced on the systems mentioned (Ubuntu 10.04 or OSX). Related issues such as bpo-9205 have been addressed elsewhere and other possibly related issues such as bpo-22393 are being tracked separately. It appears Jesse's patches in 2010 (though not explicitly added to this issue here) have indeed addressed this issue.

Using OS X 10.10:
Using Greg's original test script (from his first post), I ran the script 100 times (via shell script loop) but was unable to provoke the issue on OS X 10.10 using either the Anaconda build of Python 2.7.9 or a local build from the default (3.5.0a1+) branch.

Using Matt Walker's supplied test script (from a much later post here), I ran his script 100 times in tight succession but was unable to provoke the issue on OS X 10.10 using either the Anaconda build of Python 2.7.9 or a local build from the default (3.5.0a1+) branch. Side note: running Matt's test script kinda takes a while ("time.sleep(10)" seriously?), especially at 100 iterations.

Using Ubuntu 10.04 (64-bit):
I installed Ubuntu 10.04.4 64-bit Server in a VM and repeated the above test sequences with those two scripts using the stock Python 2.6.5, locally built 2.7 branch, and locally built default (3.5) branch versions. Summary:

  • At 100 iterations I was still unable to provoke the issue with Greg's script using 2.6.5, 2.7.9+, or 3.5.0a1+.
  • Matt's script leverages functionality only introduced in 2.7 and so could not be run as-is with 2.6.5.
  • At 100 iterations I was still unable to provoke the issue with Matt's script using 2.7.9+ or 3.5.0a1+.

Even if it's a little late in saying: thanks goes to the reporters, Greg and Matt, and to Jesse for implementing the fix.

@applio applio closed this as completed Feb 11, 2015
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant