classification
Title: multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000)
Type: enhancement Stage:
Components: Library (Lib), Windows Versions: Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jnoller Nosy List: jnoller, jpe, ocean-city
Priority: normal Keywords: patch

Created on 2008-08-14 11:11 by ocean-city, last changed 2009-04-02 04:22 by jnoller. This issue is now closed.

Files
File name Uploaded Description Edit
reproduce.py ocean-city, 2008-08-14 11:11
reproduce.py jpe, 2009-03-30 22:10 Slightly modified scipt that spawns many processes on win32
reproduce.py jnoller, 2009-03-30 22:29 reproduce2
reproduce.py jpe, 2009-03-30 23:59
win32_pipe.diff jpe, 2009-03-31 18:53 1st try at fixing, diff against trunk
win32_pipe2.diff jpe, 2009-04-01 22:04 Raise ValueError
Messages (17)
msg71119 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-08-14 11:11
I noticed sometimes regrtest.py fails in test_multiprocessing.py
(test_connection) on win2000.

I could not reproduce error by invoking test_multiprocessing alone, but
finally I could do it by incresing 'really_big_msg' to 32MB or more.

I attached reproducable code. I don't know why this happens yet.
msg71120 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-08-14 11:15
This is traceback when run reproducable.py.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "e:\python-dev\trunk\lib\multiprocessing\forking.py", line 341,
in main
    prepare(preparation_data)
  File "e:\python-dev\trunk\lib\multiprocessing\forking.py", line 456,
in prepar
e
    '__parents_main__', file, path_name, etc
  File "reproducable.py", line 20, in <module>
    conn.send_bytes(really_big_msg)
IOError: [Errno 1450] Insufficient system resources complete the
requested service to exist.
msg71125 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-08-14 12:26
After googling, ERROR_NO_SYSTEM_RESOURCES seems to happen
when one I/O size is too large.

And in Modules/_multiprocessing/pipe_connection.c, conn_send_string is
implemented with one call WriteFile(). Maybe this should be devided into
some reasonable sized chunks for several WriteFile() calls?
msg84664 - (view) Author: John Ehresman (jpe) * Date: 2009-03-30 21:49
I'll try to work on a patch for this, but the reproduce.py script seems
to spawn dozens of sub-interpreters right now when run with trunk
(python 2.7) on win32
msg84680 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-03-30 22:11
John, can you try this on trunk:

from multiprocessing import *

latin = str

SENTINEL = latin('')

def _echo(conn):
    for msg in iter(conn.recv_bytes, SENTINEL):
        conn.send_bytes(msg)
    conn.close()

conn, child_conn = Pipe()

p = Process(target=_echo, args=(child_conn,))
p.daemon = True
p.start()

really_big_msg = latin('X') * (1024 * 1024 * 32)
conn.send_bytes(really_big_msg)
assert conn.recv_bytes() == really_big_msg

conn.send_bytes(SENTINEL)                          # tell child to quit
child_conn.close()
msg84683 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-30 22:19
Really? Hmm weird...
I'm using Win2000, maybe are you using newer OS?
Or maybe larger data is needed. This guy says error occurs around 200MB.
(This is async IO though)
>http://www.gamedev.net/community/forums/topic.asp?topic_id=382135

If this happens only on my machine, maybe you can close this entry as
"works for me".
msg84684 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2009-03-30 22:20
Ah, I forgot this. Process#set_daemon doesn't exist on trunk, I had to
use "p.daemon = True" instead.
msg84687 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-03-30 22:29
John, try this new version
msg84690 - (view) Author: John Ehresman (jpe) * Date: 2009-03-30 22:35
Latest version works -- question is why prior versions spawned many 
subprocesses.  It's really another bug because prior version wasn't 
hitting the write length limit.
msg84696 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-03-30 22:38
The if __name__ clause is actually well documented, see:
http://docs.python.org/library/multiprocessing.html#windows
msg84725 - (view) Author: John Ehresman (jpe) * Date: 2009-03-30 23:59
It turns out that the original reproduce.py deadlocks if the pipe buffer
is smaller than message size -- even with a fix to the bug.  Patch to
fix is coming soon.
msg84858 - (view) Author: John Ehresman (jpe) * Date: 2009-03-31 18:53
Attached is a patch, though I have mixed feelings about it.  The OS
error can still occur even if a smaller amount is written in each
WriteFile call; I think an internal OS buffer fills up and the error is
returned if that buffer is full because the other process hasn't read
yet.  The patch just ignores ERROR_NO_SYSTEM_RESOURCES and writes again.
 I don't know though if ERROR_NO_SYSTEM_RESOURCES can mean something
else is wrong and the write will never succeed.  The message is also
broken up into 32K parts and a recv_bytes on the other end must be
called multiple times to read it all.

The patch is one option.  Another might be to let the application decide
to continue or not and essentially treat the pipes as nonblocking.
msg85045 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-04-01 16:37
I've been thinking about this a bit, and I think raising an exception and 
returning the amount of bytes read makes more sense then just hiding 
it/eating the errors. Explicit > Implicit in this case, at lease doing 
this gives the controller a method of reacting.
msg85082 - (view) Author: John Ehresman (jpe) * Date: 2009-04-01 19:45
Looking into this a bit more and reading the documentation (sorry, I 
picked this up because I know something about win32 and not because I 
know multiprocessing), it looks like a connection is supposed to be 
message oriented and not byte oriented so that a recv() should return 
what is sent in a single send().  This is like how Queue works in the 
threading case.  Note that I think the method signature when using the 
dummy.connection differ when using pipe_connection and that the two 
differ in what happens when several send_bytes's occur before a recv_bytes

I'm currently leaning toward essentially leaving the current behavior 
(and documenting it) though maybe with a better exception and 
documenting that large byte arrays can't be sent through the pipe. 
What's still an issue is if a pickle ends up being too large.
msg85085 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-04-01 20:23
On Wed, Apr 1, 2009 at 2:45 PM, John Ehresman <report@bugs.python.org> wrote:
>
> John Ehresman <jpe@wingware.com> added the comment:
>
> Looking into this a bit more and reading the documentation (sorry, I
> picked this up because I know something about win32 and not because I
> know multiprocessing), it looks like a connection is supposed to be
> message oriented and not byte oriented so that a recv() should return
> what is sent in a single send().  This is like how Queue works in the
> threading case.  Note that I think the method signature when using the
> dummy.connection differ when using pipe_connection and that the two
> differ in what happens when several send_bytes's occur before a recv_bytes
>
> I'm currently leaning toward essentially leaving the current behavior
> (and documenting it) though maybe with a better exception and
> documenting that large byte arrays can't be sent through the pipe.
> What's still an issue is if a pickle ends up being too large.
>

I think I'm fine with this as well, let's documents/add an exception -
as for the pickle being too large, I think for now we're safe
documenting it
msg85102 - (view) Author: John Ehresman (jpe) * Date: 2009-04-01 22:04
New patch which raises ValueError if WriteFile fails with
ERROR_NO_SYSTEM_RESOURCES.  I wasn't able to reliably write a test since
putting the send_bytes in a try block seems to allow the call succeed. 
This is probably OS, swap file size, and timing dependent.
msg85160 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-04-02 04:22
Patch applied in r71036 on python-trunk
History
Date User Action Args
2009-04-02 04:22:42jnollersetstatus: open -> closed
resolution: fixed
messages: + msg85160
2009-04-01 22:04:25jpesetfiles: + win32_pipe2.diff

messages: + msg85102
2009-04-01 20:23:05jnollersetmessages: + msg85085
title: multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000) -> multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000)
2009-04-01 19:45:16jpesetmessages: + msg85082
title: multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000) -> multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000)
2009-04-01 16:37:35jnollersetmessages: + msg85045
2009-03-31 18:53:57jpesetfiles: + win32_pipe.diff
keywords: + patch
messages: + msg84858
2009-03-30 23:59:36jpesetfiles: + reproduce.py

messages: + msg84725
2009-03-30 22:38:25jnollersetmessages: + msg84696
2009-03-30 22:35:26jpesetmessages: + msg84690
title: multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000) -> multiprocessing.Pipe terminates with ERROR_NO_SYSTEM_RESOURCES if large data is sent (win2000)
2009-03-30 22:29:55jnollersetfiles: + reproduce.py

messages: + msg84687
2009-03-30 22:20:26ocean-citysetmessages: + msg84684
2009-03-30 22:19:28ocean-citysetmessages: + msg84683
2009-03-30 22:11:18jnollersetmessages: + msg84680
2009-03-30 22:10:02jpesetfiles: + reproduce.py
2009-03-30 21:49:07jpesetnosy: + jpe
messages: + msg84664
2009-01-23 15:18:30jnollersettype: resource usage -> enhancement
2009-01-22 19:18:37jnollersetpriority: normal
type: resource usage
2009-01-08 21:24:52jnollersetassignee: jnoller
nosy: + jnoller
2008-08-14 12:26:31ocean-citysetmessages: + msg71125
2008-08-14 11:15:41ocean-citysetmessages: + msg71120
2008-08-14 11:11:06ocean-citycreate