Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parameterize what serialization is used in multiprocessing #72240

Open
applio opened this issue Sep 9, 2016 · 8 comments
Open

parameterize what serialization is used in multiprocessing #72240

applio opened this issue Sep 9, 2016 · 8 comments
Assignees
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-multiprocessing type-feature A feature request or enhancement

Comments

@applio
Copy link
Member

applio commented Sep 9, 2016

BPO 28053
Nosy @pitrou, @ericsnowcurrently, @applio, @pablogsal
PRs
  • bpo-28053: Complete and fix custom reducers in multiprocessing. #9959
  • bpo-28053: Allow custom reducer when using multiprocessing #15058
  • Files
  • issue_28053_missingdocs.patch: Patch but missing docs
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/applio'
    closed_at = None
    created_at = <Date 2016-09-09.20:58:54.395>
    labels = ['3.7', 'type-feature', 'library']
    title = 'parameterize what serialization is used in multiprocessing'
    updated_at = <Date 2019-07-31.16:17:41.253>
    user = 'https://github.com/applio'

    bugs.python.org fields:

    activity = <Date 2019-07-31.16:17:41.253>
    actor = 'pierreglaser'
    assignee = 'davin'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2016-09-09.20:58:54.395>
    creator = 'davin'
    dependencies = []
    files = ['44511']
    hgrepos = []
    issue_num = 28053
    keywords = ['patch']
    message_count = 8.0
    messages = ['275437', '275459', '275479', '275486', '293947', '298369', '299434', '314744']
    nosy_count = 7.0
    nosy_names = ['pitrou', 'python-dev', 'eric.snow', 'davin', 'i3v', 'Will S', 'pablogsal']
    pr_nums = ['9959', '15058']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28053'
    versions = ['Python 3.6', 'Python 3.7']

    @applio
    Copy link
    Member Author

    applio commented Sep 9, 2016

    Currently multiprocessing uses the pickle module for its serialization of objects to be communicated between processes. Specifically v2 of the pickle protocols is now exclusively used to provide maximum compatibility, motivated by the desire for multiple versions of Python to be used simultaneously with multiprocessing.

    Per conversations in bpo-26507, bpo-23403, and others, multiprocessing should offer the option to specify what serialization is to be used for the transport of data between processes. Besides supporting requests to use a different version of the pickle protocol or using 3rd party tools like dill, a hook to specify the means for reducing objects to a transmittable form opens a door for other creative or higher performance strategies.

    Ultimately, this is not an enhancement to add functionality but rather to reorganize the existing internals of multiprocessing to permit better control over its use of serialization.

    @applio applio self-assigned this Sep 9, 2016
    @applio applio added type-feature A feature request or enhancement stdlib Python modules in the Lib dir labels Sep 9, 2016
    @applio
    Copy link
    Member Author

    applio commented Sep 9, 2016

    Attaching patch containing refactorizations but missing update to docs for the purposes of review.

    Introduces three new things:

    • a function to get the current serializer/reducer
    • a function to set the serializer/reducer
    • an abstract base class to assist others in rolling their own reducers

    @ericsnowcurrently
    Copy link
    Member

    LGTM

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 9, 2016

    New changeset 7381b1b50e00 by Davin Potts in branch 'default':
    Issue bpo-28053: Applying refactorings, docs and other cleanup to follow.
    https://hg.python.org/cpython/rev/7381b1b50e00

    @applio
    Copy link
    Member Author

    applio commented May 19, 2017

    Docs need updating still.

    @applio applio added the 3.7 (EOL) end of life label May 19, 2017
    @WillS
    Copy link
    Mannequin

    WillS mannequin commented Jul 14, 2017

    Documentation would be appreciated. I have a project that uses BaseManager, Client, and Listener to create some servers and clients. I would like to update the project to work with Python 3 and would prefer to update the clients and the servers separately (i.e. switch the client to Python 3 while the server is run with Python 2.7). However, BaseManager uses connection.Client which uses connection._ConnectionBase which uses reduction.ForkingPickler without a protocol argument. It seems the default protocol is 3 on Python 3.6 and 2 on Python 2.7 (contrary to the comment above about v2 being used). I just want to set the protocol version to 2 in Python 3.6. Can I do that with the changes added by this patch?

    I tried creating pickle2reducer.py like this:

    from multiprocessing.reduction import ForkingPickler, AbstractReducer
    
    class ForkingPickler2(ForkingPickler):
        def __init__(self, *args):
            if len(args) > 1:
                args[1] = 2
            else:
                args.append(2)
            super().__init__(*args)
    
        @classmethod
        def dumps(cls, obj, protocol=2):
            return ForkingPickler.dumps(obj, protocol)
    
    
    def dump(obj, file, protocol=2):
        ForkingPickler2(file, protocol).dump(obj)
    
    
    class Pickle2Reducer(AbstractReducer):
        ForkingPickler = ForkingPickler2
        register = ForkingPickler2.register
        dump = dump

    and then putting

    import pickle2reducer
    multiprocessing.reducer = pickle2reducer.Pickle2Reducer()

    at the top of my module before

    import multiprocessing.connection

    but I still see "ValueError: unsupported pickle protocol: 3" on the server when I connect with a Python 3.6 client.

    @WillS
    Copy link
    Mannequin

    WillS mannequin commented Jul 28, 2017

    Just to follow up in case anyone comes across my last message later:

    I just had to change the last line from

    multiprocessing.reducer = pickle2reducer.Pickle2Reducer()

    to

    multiprocessing.context._default_context.reducer = pickle2reducer.Pickle2Reducer()

    @pitrou
    Copy link
    Member

    pitrou commented Mar 31, 2018

    I'd like to know if the work here will be completed soon :-) This currently lacks documentation but also tests.

    In particular, looking at the code (and that is supported by Will S' comment above), I feel the API isn't working as intended (i.e. setting the "reducer" property won't actually change the underlying parameters since the "reduction" module is imported eagerly).

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir topic-multiprocessing type-feature A feature request or enhancement
    Projects
    Status: No status
    Development

    No branches or pull requests

    4 participants