classification
Title: rewrite multiprocessing (senfd|recvfd) in Python
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: baikie, neologix, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2011-09-14 19:51 by neologix, last changed 2011-09-25 10:13 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
skip_reduction.diff neologix, 2011-09-17 11:28 review
multiprocessing_fd-3.diff neologix, 2011-09-18 21:11 review
Messages (19)
msg144047 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-14 19:51
Now that sendmsg()/recvmsg() are exposed in socketmodule, we could use them to replace the ad-hoc FD-passing routines used by multiprocessing.reduction.
Antoine suggested adding sendfd()/recvfd() methods to socket objects, but I'm not sure about this, since those only make sense for Unix domain sockets.
Two remarks on the patch attached:
- this removes sendfd()/recvfd() from _multiprocessing (but AFAICT those were never documented as part of the public API)
- EOF/invalid data received result in a RuntimeError
msg144053 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-14 21:52
I don't think that it's a problem to remove private functions.

Is it mandatory to send a non-empty message (first argument for sendmsg, b'x' 
in your patch)? The original C function sends a random byte :-)

multiprocessing_recvfd() contains cmsg_level=SOL_SOCKET and 
cmsg_type=SCM_RIGHTS, your Python function doesn't check cmsg_level or 
cmsg_type. Should it be checked?

I don't know sendmsg/recvmsg API. Do they guarantee to send/receive all data?
msg144065 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-15 06:54
> I don't think that it's a problem to remove private functions.
>

Alright.

> Is it mandatory to send a non-empty message (first argument for sendmsg, b'x'
> in your patch)? The original C function sends a random byte :-)
>

Some implementation can return EINVAL if no data is sent (i.e. you
can't send only ancillary data).

> multiprocessing_recvfd() contains cmsg_level=SOL_SOCKET and
> cmsg_type=SCM_RIGHTS, your Python function doesn't check cmsg_level or
> cmsg_type. Should it be checked?
>

Yes, it should be checked, I'll update the patch.

> I don't know sendmsg/recvmsg API. Do they guarantee to send/receive all data?
>

For data no, but ancillary data, yes. The only thing that could go
wrong would be a buffer too short to hold the ancillary data, but:
- the buffer size is computed with CMSG_DATA(), so it should be enough
- if the ancillay data is truncated, struct.unpack will raise an exception
msg144069 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-15 09:12
> - if the ancillay data is truncated, struct.unpack will raise an exception

Well, the current C code doesn't check that the data is truncated and it 
works correctly, so I don't think that it would be different in Python. 
And yes, Python adds an extra check thanks to struct.unpack.
msg144100 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-15 20:24
The patch looks correct. Did you try it on Linux, FreeBSD and/or Windows?
msg144102 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-15 20:55
I only tried on Linux.
By the way, what's the simplest way to create a personal clone to test patches on some of the buildbots before committing them?
msg144103 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-15 21:08
- Click on "Server-side clone" on http://hg.python.org/cpython
 - on your computer, hg clone default myclone # default if your local clone of cpython, default branch
 - cd myclone; edit code; hg ci
 - edit .hg/hgrc to update the repository URL
 - hg push
 - force a build on custom buildbots of your choice
msg144176 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-17 07:22
> Did you try it on Linux, FreeBSD and/or Windows?

It works fine on Linux, FreeBSD, OS X and Windows, but not on Solaris: see issue #12999.
msg144181 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-17 11:28
Here's a patch taking into account the fact that
multiprocessing.reduction might not be available and importing it can
raise an ImportError (which is already the case with the C
implementation, but multiprocessing.reduction tests have been added
recently to test_multiprocessing), e.g. if the OS doesn't support FD
passing.
With this patch, the pure Python version can be applied, and passes on
Linux, FreeBSD, OS X, Windows and OpenSolaris (except that it's not
available on OpenSolaris until issue #12999 gets fixed).
I also slightly modified the struct format used in the pure Python
version to make sure the length is sent as a a native int ("@i")
instead of a standardized int ("=i"), which might break if sizeof(int)
!= 4 (not sure there are many ILP64 architectures out there, but you
never know...).
msg144237 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-18 10:00
"It works fine on Linux, FreeBSD, OS X and Windows, but not on Solaris: see issue #12999."

Oh, thank for testing before committing :) It's hard to debug multiprocessing.
msg144240 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-18 11:05
> "It works fine on Linux, FreeBSD, OS X and Windows, but not on Solaris: see issue #12999."
>
> Oh, thank for testing before committing :) It's hard to debug multiprocessing.

Yes. Especially when you stumble upon a kernel/libc bug 25% of the time...

So, what should I do?
Apply the test catching the multiprocessing.connection ImportError to
test_multiprocessing (which is necessary even with the current C
version)?

And then apply the pure Python version, or wait until the OpenIndiana
case gets fixed?
msg144252 - (view) Author: David Watson (baikie) Date: 2011-09-18 19:54
I had a look at this patch, and the FD passing looked OK, except
that calculating the buffer size with CMSG_SPACE() may allow more
than one file descriptor to be received, with the extra one going
unnoticed - it should use CMSG_LEN() instead (the existing C
implementation has the same problem, I see).

CMSG_SPACE() exists to allow calculating the space required to
hold multiple control messages, so it essentially gives the
offset for the next cmsghdr struct such that any alignment
requirements are satisfied.

64-bit systems will probably want to ensure that all CMSG_DATA()
payloads are aligned on 8-byte boundaries, and so have
CMSG_SPACE(4) == CMSG_SPACE(8) == CMSG_LEN(8) (the Linux headers,
for instance, align to sizeof(size_t)).  So with a 32-bit int, a
buffer size of CMSG_SPACE(sizeof(int)) would allow *two* file
descriptors to be received.  CMSG_LEN() omits the padding, thus
allowing only one.

I'm not familiar with how the FD-passing facility is used in
multiprocessing, but this seems as if it could be an avenue for
DoS attacks that exhaust the number of file descriptors allowed
for the receiving process.
msg144255 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-18 21:11
> I had a look at this patch, and the FD passing looked OK, except
> that calculating the buffer size with CMSG_SPACE() may allow more
> than one file descriptor to be received, with the extra one going
> unnoticed - it should use CMSG_LEN() instead

Thanks for catching this.
Here's an updated patch.

> (the existing C implementation has the same problem, I see).

I just checked, and the C version uses CMSG_SPACE() as the buffer size, but passes CMSG_LEN() to cmsg->cmsg_len and msg.msg_controllen. Or am I missing something?
msg144309 - (view) Author: David Watson (baikie) Date: 2011-09-19 19:30
On Sun 18 Sep 2011, Charles-François Natali wrote:
> > I had a look at this patch, and the FD passing looked OK, except
> > that calculating the buffer size with CMSG_SPACE() may allow more
> > than one file descriptor to be received, with the extra one going
> > unnoticed - it should use CMSG_LEN() instead
> 
> > (the existing C implementation has the same problem, I see).
> 
> I just checked, and the C version uses CMSG_SPACE() as the buffer size, but passes CMSG_LEN() to cmsg->cmsg_len and msg.msg_controllen. Or am I missing something?

Ah, no, you're right - that's fine.  Sorry for the false alarm.
msg144345 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-09-20 18:34
New changeset 1d91a3ba5c87 by Charles-François Natali in branch 'default':
Issue #12981: test_multiprocessing: catch ImportError when importing
http://hg.python.org/cpython/rev/1d91a3ba5c87
msg144346 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-09-20 18:37
I committed the patch to catch the ImportError in test_multiprocessing.
I'll commit the other patch (pure Python version) in a couple days.

> Ah, no, you're right - that's fine.  Sorry for the false alarm.

No problem. As they say, "better safe than sorry".
msg144352 - (view) Author: David Watson (baikie) Date: 2011-09-20 21:04
On Tue 20 Sep 2011, Charles-François Natali wrote:

> I committed the patch to catch the ImportError in test_multiprocessing.

This should go in all branches, I think - see issue #13022.
msg144383 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-09-21 16:47
New changeset c158eac8e951 by Charles-François Natali in branch '2.7':
Issue #12981: test_multiprocessing: catch ImportError when importing
http://hg.python.org/cpython/rev/c158eac8e951

New changeset 6e04d406bb86 by Charles-François Natali in branch '3.2':
Issue #12981: test_multiprocessing: catch ImportError when importing
http://hg.python.org/cpython/rev/6e04d406bb86

New changeset 21b837aa07b9 by Charles-François Natali in branch 'default':
Issue #12981: test_multiprocessing: catch ImportError when importing
http://hg.python.org/cpython/rev/21b837aa07b9
msg144504 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-09-24 18:04
New changeset 95ee0df1e746 by Charles-François Natali in branch 'default':
Issue #12981: rewrite multiprocessing_{sendfd,recvfd} in Python.
http://hg.python.org/cpython/rev/95ee0df1e746
History
Date User Action Args
2011-09-25 10:13:34neologixsetstatus: open -> closed
2011-09-25 10:13:23neologixsetdependencies: - _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED usage on Solaris
resolution: fixed
stage: resolved
2011-09-24 18:04:02python-devsetmessages: + msg144504
2011-09-21 16:47:45python-devsetmessages: + msg144383
2011-09-20 21:04:08baikiesetmessages: + msg144352
2011-09-20 18:37:19neologixsetmessages: + msg144346
2011-09-20 18:34:56python-devsetnosy: + python-dev
messages: + msg144345
2011-09-19 19:30:17baikiesetmessages: + msg144309
2011-09-18 21:12:02neologixsetfiles: - multiprocessing_fd-2.diff
2011-09-18 21:11:40neologixsetfiles: + multiprocessing_fd-3.diff

messages: + msg144255
2011-09-18 19:54:11baikiesetnosy: + baikie
messages: + msg144252
2011-09-18 11:05:54neologixsetmessages: + msg144240
2011-09-18 10:00:53vstinnersetmessages: + msg144237
2011-09-17 11:28:51neologixsetfiles: - multiprocessing_fd-1.diff
2011-09-17 11:28:28neologixsetfiles: + skip_reduction.diff, multiprocessing_fd-2.diff

messages: + msg144181
2011-09-17 07:22:45neologixsetdependencies: + _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED usage on Solaris
messages: + msg144176
2011-09-15 21:08:00vstinnersetmessages: + msg144103
2011-09-15 20:55:11neologixsetmessages: + msg144102
2011-09-15 20:24:35vstinnersetmessages: + msg144100
2011-09-15 18:28:03neologixsetfiles: - multiprocessing_fd.diff
2011-09-15 18:27:49neologixsetfiles: + multiprocessing_fd-1.diff
2011-09-15 09:12:15vstinnersetmessages: + msg144069
2011-09-15 06:54:38neologixsetmessages: + msg144065
2011-09-14 21:52:29vstinnersetmessages: + msg144053
2011-09-14 19:51:23neologixcreate