Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sending Connection-objects over multiprocessing connections fails #49142

Closed
gsson mannequin opened this issue Jan 9, 2009 · 31 comments
Closed

Sending Connection-objects over multiprocessing connections fails #49142

gsson mannequin opened this issue Jan 9, 2009 · 31 comments
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@gsson
Copy link
Mannequin

gsson mannequin commented Jan 9, 2009

BPO 4892
Nosy @pitrou, @kristjanvalur, @jodal
Files
  • mp_pickle_conn.patch
  • mp_pickle_conn.patch
  • mp_pickle_conn.patch
  • mp_pickle_conn.patch
  • mp_pickle_conn.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-04-24.20:59:07.466>
    created_at = <Date 2009-01-09.11:46:06.880>
    labels = ['type-feature', 'library']
    title = 'Sending Connection-objects over multiprocessing connections fails'
    updated_at = <Date 2012-04-24.20:59:07.465>
    user = 'https://bugs.python.org/gsson'

    bugs.python.org fields:

    activity = <Date 2012-04-24.20:59:07.465>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-04-24.20:59:07.466>
    closer = 'pitrou'
    components = ['Library (Lib)']
    creation = <Date 2009-01-09.11:46:06.880>
    creator = 'gsson'
    dependencies = []
    files = ['25160', '25167', '25181', '25270', '25272']
    hgrepos = []
    issue_num = 4892
    keywords = ['patch']
    message_count = 31.0
    messages = ['79464', '79465', '79474', '84707', '91964', '91965', '91968', '91969', '126447', '157704', '157724', '157736', '157737', '157739', '157740', '157743', '157745', '157749', '157753', '157838', '157840', '157852', '157950', '157960', '157963', '158065', '158682', '158711', '158714', '159207', '159208']
    nosy_count = 11.0
    nosy_names = ['pitrou', 'kristjan.jonsson', 'jnoller', 'gsson', 'dsvensson', 'asksol', 'jodal', 'Jimbofbx', 'dragonfyre13', 'python-dev', 'sbt']
    pr_nums = []
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue4892'
    versions = ['Python 3.3']

    @gsson
    Copy link
    Mannequin Author

    gsson mannequin commented Jan 9, 2009

    It seems the old pyprocessing (http://pyprocessing.berlios.de/) can do
    some things that the new multiprocessing package can not;
    sending/receiving connection objects for one.

    This is a quite handy functionality, so it would be nice if it were
    reintroduced.

    Also, the error message isn't very helpful.

    Failing test below.

    $ python2.6 pipetest2.py 
    asdf
    Traceback (most recent call last):
      File "a.py", line 10, in <module>
        print c1.recv()
    TypeError: Required argument 'handle' (pos 1) not found
    $ PYTHONPATH=processing-0.52-py2.5-macosx-10.5-i386.egg python2.5 
    pipetest2.py 
    asdf
    Connection(handle=5)
    $ PYTHONPATH=multiprocessing-2.6.0.2-py2.5-macosx-10.5-i386.egg 
    python2.5 pipetest2.py 
    asdf
    Traceback (most recent call last):
      File "pipetest2.py", line 10, in <module>
        print c1.recv()
    TypeError: function takes at least 1 argument (0 given)
    $ uname -a
    Darwin midori.local 9.6.0 Darwin Kernel Version 9.6.0: Mon Nov 24 
    17:37:00 PST 2008; root:xnu-1228.9.59~1/RELEASE_I386 i386

    @gsson gsson mannequin added type-bug An unexpected behavior, bug, or error OS-mac stdlib Python modules in the Lib dir labels Jan 9, 2009
    @gsson
    Copy link
    Mannequin Author

    gsson mannequin commented Jan 9, 2009

    $ cat pipetest2.py 
    try:
    	from multiprocessing import Pipe
    except ImportError:
    	from processing import Pipe

    c1, c2 = Pipe(duplex=False)
    c2.send('asdf')
    print c1.recv()
    c2.send(c1)
    print c1.recv()

    @jnoller jnoller mannequin self-assigned this Jan 9, 2009
    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Jan 9, 2009

    thanks for filing this. I'll need to compare the two code bases and figure
    out why it's either regressed, or Richard removed it prior to the
    integration.

    @jnoller jnoller mannequin added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Jan 18, 2009
    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Mar 30, 2009

    Before I can logically support this, I need a clear use case that supports
    the idea that this should be supported in the current version of
    multiprocessing.

    @dsvensson
    Copy link
    Mannequin

    dsvensson mannequin commented Aug 26, 2009

    A typical use case would be for a server to receive a connection, and
    then send that connection over to another process that does the actual
    work. This used to work with pyprocessing, and the support seems to be
    available in multiprocessing.c -> multiprocessing_sendfd, but not used.

    @dsvensson
    Copy link
    Mannequin

    dsvensson mannequin commented Aug 26, 2009

    And to be clear, I have enabled connection pickling by issuing:

    multiprocessing.allow_connection_pickling()

    @dsvensson
    Copy link
    Mannequin

    dsvensson mannequin commented Aug 26, 2009

    @dsvensson
    Copy link
    Mannequin

    dsvensson mannequin commented Aug 26, 2009

    Ehm.. completly broken url in prev message.. Revision 65016, "Apply
    Amaury's patch to multiprocessing for bpo-3125, removes the copy_reg
    and replaces it with ForkingPickler.register(), which should resolve the
    conflict with the global registry/ctypes" is what I'm refering to.
    Without this patch, the reducers/rebuilders are properly registered in
    the pickler that connection.h (srsly, code in header files?) later uses.

    @dragonfyre13
    Copy link
    Mannequin

    dragonfyre13 mannequin commented Jan 18, 2011

    Wanted to quickly comment here, as I'm dealing with this issue as well, that I did find a workaround for avoiding it as far back as 2.6 (and it's not "don't pass a Pipe through a Pipe")

    multiprocessing.reduction can already do this, though I don't entirely know why this isn't automatically done if it's a connection object.

    >>> from multiprocessing import Pipe, reduction
    >>> i, o = Pipe()
    >>> reduced = reduction.reduce_connection(i)
    >>> newi = reduced[0](*reduced[1])
    >>> newi.send("hi")
    >>> o.recv()
    'hi'
    >>> 

    The reduced0 line is actually calling reduction.rebuild_connection, as that function is the first element in the tuple, and the second element is the arguments to be passed to it. I can't seem to find any info on reduction.reduce_connection, so I don't know if this is how this was intended to be handled or not.

    P.S. Tested on Win (XP) and Linux (Ubuntu 10.10), so there's no weird windows socket stuff that should go wrong with this.

    @Jimbofbx
    Copy link
    Mannequin

    Jimbofbx mannequin commented Apr 6, 2012

    err, is it possible to edit out those file paths? I didn't intend them to be in the message. I'd appreciate it if someone with the privileges to do so could remove them.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 7, 2012

    err, is it possible to edit out those file paths?

    I don't know how to do that. If you want I can remove the message altogether. But I don't see anything confidential or exploitable in your message.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 7, 2012

    Jimbofbx wrote:

    def main():
    from multiprocessing import Pipe, reduction
    i, o = Pipe()
    print(i);
    reduced = reduction.reduce_connection(i)
    print(reduced);
    newi = reduced0
    print(newi);
    newi.send("hi")
    o.recv()

    On Windows with a PipeConnection object you should use rebuild_pipe_connection() instead of rebuild_connection(). With that change, on Python 3.3 I get

    <multiprocessing.connection.PipeConnection object at 0x025BBCB0>
    (<function rebuild_pipe_connection at 0x0262F420>, (('\\\\.\\pipe\\pyc-6000-1-30lq4p', 356, False), True, True))
    <multiprocessing.connection.PipeConnection object at 0x029FF710>
    

    Having said all that I agree multiprocessing.reduction should be fixed. Maybe an enable_pickling_support() function could be added to register the necessary things with copyreg.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 7, 2012

    Having said all that I agree multiprocessing.reduction should be
    fixed. Maybe an enable_pickling_support() function could be added to
    register the necessary things with copyreg.

    Why not simply use ForkingPickler?

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 7, 2012

    ForkingPickler is only used when creating a child process. The multiprocessing.reduction module is only really intended for sending stuff to *pre-existing* processes.

    As things stand, after importing multiprocessing.reduction you can do something like

      buf = io.BytesIO()
      pickler = ForkingPickler(buf)
      pickler.dump(conn)
      data = buf.getvalue()
      writer.send_bytes(data)

    But that is rather less simple and obvious than just doing

      writer.send(conn)

    which was possible in pyprocessing.

    Originally just importing the module magically registered the reduce functions with copyreg. Since this was undesirable, the reduction functions were instead registered with ForkingPickler. But this fix rather missed the point of the module.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 7, 2012

    ForkingPickler is only used when creating a child process. The
    multiprocessing.reduction module is only really intended for sending
    stuff to *pre-existing* processes.

    But ForkingPickler could be used in multiprocessing.connection, couldn't
    it?

    @Jimbofbx
    Copy link
    Mannequin

    Jimbofbx mannequin commented Apr 7, 2012

    @pitrou

    You can just delete my original post. I'll repost an edited version here for reference

    original post with paths removed:
    This is an issue for me (Python 3.2). I have a custom pool that sends arguments for a function call over a pipe. I cannot send another pipe as an argument.

    Tim's workaround also does not work for me (win xp 32bit and 64bit)

    From what I can tell, you can only send a connection as a direct argument to a function call. This limits what I can do because I cannot introduce new pipes to a worker process after it is instantiated.

    Using this code:

    def main():
        from multiprocessing import Pipe, reduction
        i, o = Pipe()
        print(i);
        reduced = reduction.reduce_connection(i)
        print(reduced);
        newi = reduced[0](*reduced[1])
        print(newi);
        newi.send("hi")
        o.recv()
    
    if __name__ == "__main__":
        main();

    This is my output:

    <read-write PipeConnection, handle 1760>
    (<function rebuild_connection at 0x00FD4C00>, (('\\\\.\\pipe\\pyc-3156-1-q5wwnr', 1756, False), True, True))
    <read-write Connection, handle 1720>
    >>> newi.send("hi")
    IOError: [Errno 10038] An operation was attempted on something that is not a socket

    As you can see, the handle changes

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 7, 2012

    But ForkingPickler could be used in multiprocessing.connection,
    couldn't it?

    I suppose so.

    Note that the way a connection handle is transferred between existing processes is unnecessarily inefficient on Windows. A background server thread (one per process) has to be started and the receiving process must connect back to the sending process to receive its duplicate handle.

    There is a simpler way to do this on Windows. The sending process duplicates the handle, and the receiving process duplicates that second handle using DuplicateHandle() and the DUPLICATE_CLOSE_SOURCE flag. That way no server thread is necessary on Windows.

    I got this to work recently for pickling references to file handles for mmaps on. (A server thread would still be necessary on Unix.)

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 7, 2012

    There is a simpler way to do this on Windows. The sending process
    duplicates the handle, and the receiving process duplicates that second
    handle using DuplicateHandle() and the DUPLICATE_CLOSE_SOURCE flag. That
    way no server thread is necessary on Windows.

    Note that this should not be done for socket handles since DuplicateHandle() is not supposed to work for them. socket.share() and socket.fromshare() with a server thread can be used for sockets.

    @Jimbofbx
    Copy link
    Mannequin

    Jimbofbx mannequin commented Apr 7, 2012

    Shouldn't reduce_pipe_connection just be an alias for reduce_connection in unix so that using reduce_pipe_connection would work for both win and unix? My understanding after looking at the code is that reduce_pipe_connection isn't defined for non-win32, although I haven't tested it to see if that's true.

    Of course, ideally a pipe connection would just pickle and unpickle properly out-of-the-box, which I think was the original intent.

    Here's a complete, working example with Python 3.2 tested on Win 7 64-bit:

    import sys
    from multiprocessing import Process,Pipe, reduction
    
    def main():
        print("starting");
        i, o = Pipe(False)
        parent, child = Pipe();
        reducedchild = reduce_pipe(child);
        p = Process(target=helper, args=(i,));
        p.start();
        parent.send("hi");
        o.send(reducedchild);
        print(parent.recv());
        print("finishing");
        p.join();
        print("done");
    
    def helper(inPipe):
        childPipe = expand_reduced_pipe(inPipe.recv());
        childPipe.send("child got: " + childPipe.recv());
        return;
    
    def reduce_pipe(pipe):
        if sys.platform == "win32":
            return reduction.reduce_pipe_connection(pipe);
        else:
            return reduction.reduce_connection(pipe);
    
    def expand_reduced_pipe(reduced_pipe):
        return reduced_pipe[0](*reduced_pipe[1]);
    
    if __name__ == "__main__":
        main();

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 9, 2012

    There is an undocumented function multiprocessing.allow_connection_pickling() whose docstring claims it allows connection and socket objects to be pickled.

    The attached patch fixes the multiprocessing.reduction module so that it works correctly. This means that TestPicklingConnections can be reenabled in the unit tests.

    The patch uses the new socket.share() and socket.fromshare() methods on Windows.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 9, 2012

    Unless there's a technical barrier, I still think it would be better to use ForkingPickler in multiprocessing.connection, rather than modify global state (copyreg). The pickling support is multiprocessing-specific and wouldn't make sense for other pickles (e.g. stored to disk).

    @kristjanvalur
    Copy link
    Mannequin

    kristjanvalur mannequin commented Apr 9, 2012

    I just want to point out that each time socket.share() is called, the resulting data can only be used once by socket.fromshare(). I'm mentioning this because I know there is some caching mechanism in reduction.py and that this data is not cacheable/reuseable.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 10, 2012

    Updated patch which uses ForkingPickler in Connection.send().

    Note that connection sharing still has to be enabled using allow_connection_pickling().

    Support could be enabled automatically, but that would introduce more circular imports which confuse me. It might be worthwhile refactoring to eliminate all circular imports.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 10, 2012

    Support could be enabled automatically, but that would introduce more
    circular imports which confuse me.

    Are you sure? AFAICT:

    • connection depends on forking
    • reduction depends on forking and connection

    But connection doesn't depend on reduction, neither does forking.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 10, 2012

    But connection doesn't depend on reduction, neither does forking.

    If registration of (Pipe)Connection is done in reduction then you can't make (Pipe)Connection picklable *automatically* unless you make connection depend on reduction (possibly indirectly).

    A circular import can be avoided by making reduction not import connection at module level. So not hard to fix.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 11, 2012

    The last patch did not work on Unix.

    Here is a new version where the reduction functions are automatically registered, so allow_connection_pickling() is redundant.

    @pitrou
    Copy link
    Member

    pitrou commented Apr 18, 2012

    Could you regenerate your patch now that the win32 -> _winapi changes have been applied?

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 19, 2012

    Up to date patch.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented Apr 19, 2012

    A couple of minor changes based on Antoine's earlier review (which I did not notice till now).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 24, 2012

    New changeset 08d4c2fe51ea by Antoine Pitrou in branch 'default':
    Issue bpo-4892: multiprocessing Connections can now be transferred over multiprocessing Connections.
    http://hg.python.org/cpython/rev/08d4c2fe51ea

    @pitrou
    Copy link
    Member

    pitrou commented Apr 24, 2012

    Thanks, Richard. I have now committed the patch. Hopefully the Windows buildbots will be ok :)

    @pitrou pitrou closed this as completed Apr 24, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants