classification
Title: multiprocessing.Pool garbles call stack for __new__
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Charles McEachern, davin, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-04-07 17:25 by Charles McEachern, last changed 2017-04-14 17:23 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
newpool.py Charles McEachern, 2017-04-07 17:25 Minimal working example
Messages (7)
msg291278 - (view) Author: Charles McEachern (Charles McEachern) Date: 2017-04-07 17:25
I'm calling the constructor of Foo, a subclass of str. Expected output:

Called Foo.__new__ with args = ('TIMESTAMP', 'INPUT0')
TIMESTAMP OUTPUT0

When I make the call using a multiprocessing.pool.ThreadPool, it works fine. But when I make the call using a multiprocessing.Pool (using the apply or apply_async method), I get:

Called Foo.__new__ with args = ('TIMESTAMP', 'INPUT0')
Called Foo.__new__ with args = ('TIMESTAMP OUTPUT0',)
Exception in thread Thread-3:
...
ValueError: Bad Foo input: ('TIMESTAMP OUTPUT0',)

That is, the object I just constructed seems to be getting shoved right back into the constructor. 

When I swap out the Foo class for the similar Goo class, which is not a str, and uses __init__ instead of __new__, I again see no problems:

Called Goo.__init__ with args = ('TIMESTAMP', 'INPUT0')
<Goo TIMESTAMP OUTPUT0>

I see this in 2.7.9 as well as 3.4.5. Looks like it's present in 2.7.2 and 3.5.2 as well:

https://github.com/charles-uno/python-new-pool-bug/issues/1
msg291292 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2017-04-07 20:49
It looks like the first 'Called Foo.__new__' is being reported by the child (pool of 1) process and the second 'Called Foo.__new__' is being reported by the parent process.  In multiprocessing, because objects are by default serialized using pickle, this may be caused by the unpickling of the Foo object by the parent process which is something you would not experience when using ThreadPool because it does not have the same need for serialization.

Example showing invocation of __new__ as part of unpickling:
>>> class Foo(object):
...     def __new__(cls):
...         print("New")
...         return object.__new__(cls)
... 
>>> import pickle
>>> f = Foo()
New
>>> pf = pickle.dumps(f, protocol=2)
>>> pickle.loads(pf)                  # unpickling triggers __new__
New
<__main__.Foo object at 0x1084a06d0>



Having discovered this phenomenon, is this causing a problem for you somewhere in code?  (Your example code on github was helpful, thank you, but it didn't merely demonstrated the behavior and didn't show where this was causing you pain.)
msg291294 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2017-04-07 20:58
Expanding my above example to show how multiprocessing relates:
>>> import multiprocessing
>>> import os
>>> class Floof(object):
...     def __new__(cls):
...         print("New via pid=%d" % os.getpid())
...         return object.__new__(cls)
... 
>>> os.getpid()                               # parent pid
46560
>>> pool = multiprocessing.Pool(1)
>>> getter = pool.apply_async(Floof, (), {})  # output seen from child AND parent
>>> New via pid=46583
New via pid=46560

>>> getter.get()                              # everything seems to be working as intended
<__main__.Floof object at 0x10866f250>


FWIW, near the end of my prior message:  s/it didn't merely/it merely/
msg291295 - (view) Author: Charles McEachern (Charles McEachern) Date: 2017-04-07 21:06
This caused me several hours of misery yesterday, trying to isolate what was going wrong. 

I am unfortunately not at liberty to share the code I'm working on. The example on GitHub has the general thrust of it: my constructor was always called in a specific way, and didn't expect to be given something that was already processed. 

Interesting to see that this is a product of pickling. That makes me think that "fixing" this corner case would probably be a lot of work.

I suppose I should just work around it by checking right away if the input to my constructor has already been constructed!
msg291297 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2017-04-07 21:14
> I am unfortunately not at liberty to share the code I'm working on.

I very much understand and am very thankful you took the time to create a simple example that you could share.  Honestly, that's the reason I felt inspired to stop what I was doing to look at this now rather than later.


> I suppose I should just work around it by checking right away if the input to my constructor has already been constructed!

There are probably a number of different ways to address it but your suggestion of adding a check to see if this is the first time that object has been constructed sounds like it might be an easy win.
msg291299 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-04-07 21:39
I suspect you just need to add pickle support to your class.  When I subclassed str in the email package, I found I needed to do that.  I'd have to go through the docs again to remember how the code works, but you can take a look at the BaseHeader class in email.headerregistry to see what I did.
msg291302 - (view) Author: Charles McEachern (Charles McEachern) Date: 2017-04-07 22:11
That seems to do it! Looks like the trick is to define __reduce__ to help out the serializer. Thanks!
History
Date User Action Args
2017-04-14 17:23:12serhiy.storchakasetstatus: open -> closed
nosy: + serhiy.storchaka

resolution: not a bug
stage: resolved
2017-04-07 22:11:26Charles McEachernsetmessages: + msg291302
2017-04-07 21:39:05r.david.murraysetnosy: + r.david.murray
messages: + msg291299
2017-04-07 21:14:32davinsetmessages: + msg291297
2017-04-07 21:06:56Charles McEachernsetmessages: + msg291295
2017-04-07 20:58:35davinsetmessages: + msg291294
2017-04-07 20:49:12davinsetnosy: + davin
messages: + msg291292
2017-04-07 17:25:19Charles McEacherncreate