This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [subinterpreters] _xxsubinterpreters: Can't unpickle objects defined in __main__
Type: behavior Stage:
Components: Subinterpreters Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: LewisGaul, crusaderky, eric.snow, maciej.szulik, ncoghlan
Priority: normal Keywords:

Created on 2019-06-15 13:06 by crusaderky, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg345680 - (view) Author: Guido Imperiale (crusaderky) * Date: 2019-06-15 13:06
As of CPython 3.8.0b1:

If one pickles an object that is defined in the __main__ module, sends it to a subinterpreter as bytes, and then tries unpickling it there, it fails saying that __main__ doesn't define it.


import _xxsubinterpreters as interpreters
import pickle


class C:
    pass


c = C()

interp_id = interpreters.create()
c_bytes = pickle.dumps(c)
interpreters.run_string(
    interp_id,
    "import pickle; pickle.loads(c_bytes)",
    shared={"c_bytes": c_bytes},
)


If the above is executed directly with the python command-line, it fails. If it's imported from another module, it works.
One would expected behaviour compatible with sub-processes spawned with the spawn method, where the__main__ of the parent process is visible to the subprocess too.

Workarounds:
1 - define everything that must be pickled in an imported module
2 - use CloudPickle
msg357396 - (view) Author: Lewis Gaul (LewisGaul) * Date: 2019-11-24 13:18
Just to move the conversation from the subinterpreters project repo to here...

I'm going to take a look at how this is done by subprocess using the example provided by Guido:

import os
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import get_context

class C:
    def __getstate__(self):
        print("pickled in %d" % os.getpid())
        return {}

    def __setstate__(self, state):
        print("unpickled in %d" % os.getpid())

    def hello(self):
        print("Hello world")


if __name__ == "__main__":
    with ProcessPoolExecutor(mp_context=get_context("spawn")) as ex:
        ex.submit(C().hello).result()

Output:

pickled in 23480
unpickled in 23485
Hello world
msg357413 - (view) Author: Lewis Gaul (LewisGaul) * Date: 2019-11-24 21:45
The relevant code for the multiprocessing example seems to be in Lib/multiprocessing/spawn.py. I think I get what it's doing, but I'm not sure whether we actually need something similar for subinterpreters. Any thoughts @eric.snow?
msg357447 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-11-25 17:14
Yeah, the case of pickle + __main__ is an awkward one. [1]  However, the approach taken by multiprocessing isn't the right one for subinterpreters.

Multiprocessing operates under 2 design points that do not apply to subinterpreters:

* every process is running in the same __main__ module (sans the "script" part)
* pickle is a critical part of data-passing

For spawn, multiprocessing automatically runs the original __main__ module in each newly spawned process. [2]  Note that it runs it there with __name__ set to __mp_main__ (rather than __main__), to keep the "script" part from running.  
Subinterpreters could be made to work like this [3] but in reality they are more like processes created using the subprocess module.

I do not expect that to change.  However, there is room for add opt-in support for rerunning __main__ in a subinterpreter, or helpers to accomplish the same.  We can address such opt-in support or helpers in a separate issue later.  For now we are focusing on the fundamentals (at least in the context of PEP 554).


[1] Note that the problem is only with the __main__ module.  For other modules pickle does the right thing.
[2] https://github.com/python/cpython/blob/master/Lib/multiprocessing/spawn.py#L287
[3] I expect we will see subinterpreters supported in the multiprocessing module just like threads are supported.
msg357448 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-11-25 17:20
In the meantime that leaves the workarounds that @crusaderky originally identified.  You could also:

* manually run __main__ in the subinterpreter first (sort of like multiprocessing does automatically); this works because the namespace of __main__ is not reset for each run_string() call
* (for -m) update __module__ of the relevant objects to be the actual module name rather than "__main__"

In the case of that second point, it relates to PEP 499 (which will ensure that the module is added to sys.modules in the -m case).  However, that PEP doesn't say anything about updating __module__ for objects.  I'll bring that up there.  With that solution the problem in this issue would go away.

Note that it won't help for objects in the __main__ of subinterpreters, since they do not correspond to executed modules.  Hmm, maybe it could still work...

Regardless, I'll open issues over on https://github.com/ericsnowcurrently/multi-core-python to track these possible future enhancements.
msg358423 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-12-15 12:46
There's a reason multiprocessing in spawn mode jumps through the hoops that it does: it's the only way to get __main__ pickling to work when you're not forking the entire process.

You also don't want to naively re-run __main__ in the subprocess for the same reason multiprocessing doesn't: doing so would also re-run the script parts, not just the class and function definition parts.

So while I don't think it should be implicit the way it is for multiprocessing spawn mode, I do think it would make sense to offer a way to explicitly opt-in to re-running __main__ in a child interpreter as "__si_main__", aliasing the subinterpreter's __main__ module to that, and also aliasing "__si_main__" to "__main__" in the parent interpreter.
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81473
2020-05-15 00:40:23vstinnersetcomponents: + Subinterpreters
title: _xxsubinterpreters: Can't unpickle objects defined in __main__ -> [subinterpreters] _xxsubinterpreters: Can't unpickle objects defined in __main__
2020-02-07 15:35:15maciej.szuliksetnosy: + maciej.szulik
2019-12-15 12:46:28ncoghlansetnosy: + ncoghlan
messages: + msg358423
2019-11-25 17:20:41eric.snowsetmessages: + msg357448
2019-11-25 17:14:04eric.snowsetmessages: + msg357447
2019-11-24 21:45:44LewisGaulsetmessages: + msg357413
2019-11-24 13:18:19LewisGaulsetnosy: + LewisGaul
messages: + msg357396
2019-06-15 13:06:41crusaderkysetversions: + Python 3.8
2019-06-15 13:06:31crusaderkycreate