Created on 2013-12-08 09:38 by neologix, last changed 2013-12-08 19:05 by gregory.p.smith. This issue is now closed.
|test_sub_read.py||neologix, 2013-12-08 09:38|
|msg205534 - (view)||Author: Charles-François Natali (neologix) *||Date: 2013-12-08 09:38|
This is a spinoff of issue #19506: currently, subprocess.communicate() uses a 4K buffer when reading data from pipes. This was probably optimal a couple years ago, but nowadays most operating systems have larger pipes (e.g. Linux has 64K), so we might be able to gain some performance by increasing this buffer size. For example, here's a benchmark reading from a subprocess spawning "dd if=/dev/zero bs=1M count=100": # before, 4K buffer $ ./python ~/test_sub_read.py 2.72450800300021 # after, 64K buffer $ ./python ~/test_sub_read.py 1.2509000449999803 The difference is impressive. I'm attaching the benchmark script so that others can experiment a bit (on multi-core machines and also different OSes).
|msg205540 - (view)||Author: STINNER Victor (haypo) *||Date: 2013-12-08 11:02|
Where is the buffer size? The hardcoded 4096 value in Popen._communicate()? data = os.read(key.fd, 4096) I remember that I asked you where does 4096 come from when you patched subprocess to use selectors (#18923): http://bugs.python.org/review/18923/#ps9827 Python 3.3 uses 1024 for its select.select() implementation, 4096 for its select.poll() implementation. Since Popen.communicate() returns the whole content of the buffer, would it be safe to increase the buffer size? For example, use 4 GB as the buffer size? Should communicate() be "fair" between stdout and stderr? To choose a new value, we need benchmark results on different OSes, at least Windows, Linux, FreeBSD (and maybe also Solaris) since these 3 OSes have a different kernel.
|msg205541 - (view)||Author: STINNER Victor (haypo) *||Date: 2013-12-08 11:03|
Oh, Charles-François Natali replied to my review by email, and it's not archived on Rietveld. Copy of him message: > http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode420 > Lib/subprocess.py:420: _PopenSelector = selectors.SelectSelector > This code should be factorized, but I opened a new issue for that: > #19465. > http://bugs.python.org/issue19465 OK, so we'll see in the other issue :-) > http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode1583 > Lib/subprocess.py:1583: self._input_offset + _PIPE_BUF] > (Unrelated to the selector issue) It may be more efficient to use a > memory view instead of a slice here. I also noticed, but that's also a different issue. > http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode1597 > Lib/subprocess.py:1597: data = os.read(key.fd, 4096) > In the old code, poll() uses 4096 whereas select() uses 1024 bytes... Do > you know the reason? Why 4096 and not 1 byte or 1 MB? I would prefer a > global constant rather than an harded limit. When writing to the pipe, we should definitely write less than PIPE_BUF, to avoid blocking after select()/poll() returns write-ready. When reading, it's not so important, in the sense that any value will work. The only difference will be a difference speed. Logically, I would say that the ideal size is the pipe buffer size, which varies between 4K and 64K: and indeed, I wrote a quick benchmark, and a 64K value yields the best result. > Use maybe os.stat(fd).st_blksize? IMO that's not worth it, also, IIRC, some OS don't set it for pipes. > Modules/_io/fileio.c uses SMALLCHUNK to read the whole content of a > file: > > #if BUFSIZ < (8*1024) > #define SMALLCHUNK (8*1024) > #elif (BUFSIZ >= (2 << 25)) > #error "unreasonable BUFSIZ > 64MB defined" > #else > #define SMALLCHUNK BUFSIZ > #endif > > The io module uses a default buffer size of 8 KB to read. Well, that's just a heuristic: the ideal size depends on the OS, the filesystem, backing device, etc (for an on-disk file)? Anyway, this too belongs to another issue. Please try to only make comments relevant to the issue at hand! > http://bugs.python.org/review/18923/diff/9757/Lib/test/test_subprocess.py > File Lib/test/test_subprocess.py (left): > > http://bugs.python.org/review/18923/diff/9757/Lib/test/test_subprocess.py#oldcode2191 > Lib/test/test_subprocess.py:2191: class > ProcessTestCaseNoPoll(ProcessTestCase): > Oh oh, removing a test is not a good idea. You should test select and > poll selectors when poll is used by default. Well, those distinct test cases made sense before, because there were two different code paths depending on whether we were using select() or poll(). Now, the code path is exactly the same, and the different selectors implementations have their own tests, so IMO that's not necessary anymore. But I re-enabled them anyway...
|msg205552 - (view)||Author: Charles-François Natali (neologix) *||Date: 2013-12-08 12:22|
> STINNER Victor added the comment: > > Since Popen.communicate() returns the whole content of the buffer, would it be safe to increase the buffer size? For example, use 4 GB as the buffer size? Sure, if you want to pay the CPU and memory overhead of allocating a 4GB buffer :-) > Should communicate() be "fair" between stdout and stderr? There's no reason to make the buffer depend on the FD. > To choose a new value, we need benchmark results on different OSes, at least Windows, Linux, FreeBSD (and maybe also Solaris) since these 3 OSes have a different kernel. Windows isn't concerned by this, since it doesn't use a selector, but threads. For the other OSes, that's why I opened this issue (you forgot AIX in your list :-).
|msg205573 - (view)||Author: Antoine Pitrou (pitrou) *||Date: 2013-12-08 15:57|
Linux, 64-bit quad core: With 4K buffer: $ time ./python test_sub_read.py 0.25217683400114765 real 0m0.296s user 0m0.172s sys 0m0.183s With 64K buffer: $ time ./python test_sub_read.py 0.09257541999977548 real 0m0.132s user 0m0.051s sys 0m0.096s
|msg205592 - (view)||Author: Roundup Robot (python-dev)||Date: 2013-12-08 18:58|
New changeset 03a056c3b88e by Gregory P. Smith in branch '3.3': Fixes issue #19929: Call os.read with 32768 within subprocess.Popen http://hg.python.org/cpython/rev/03a056c3b88e New changeset 4de4b5a4e405 by Gregory P. Smith in branch 'default': Fixes issue #19929: Call os.read with 32768 within subprocess.Popen http://hg.python.org/cpython/rev/4de4b5a4e405
|msg205593 - (view)||Author: Charles-François Natali (neologix) *||Date: 2013-12-08 18:59|
> Roundup Robot added the comment: > > New changeset 03a056c3b88e by Gregory P. Smith in branch '3.3': > Fixes issue #19929: Call os.read with 32768 within subprocess.Popen > http://hg.python.org/cpython/rev/03a056c3b88e Not that it bothers me, but AFAICT this isn't a bugfix, and as such shouldn't be backported to 3.3, no?
|msg205594 - (view)||Author: Gregory P. Smith (gregory.p.smith) *||Date: 2013-12-08 19:05|
I saw a small regression over 4k when using a 64k buffer on one of my machines (dual core amd64 linux). With 32k everything (amd64 linux, armv7l 32-bit linux, 64-bit os x 10.6) showed a dramatic improvement on the microbenchmark. approaching 50% less cpu use in many cases. i doubt applications will notice as much as they're likely to be dominated by their own application code rather than the subprocess internals. re: 3.3 or not, true, but since it doesn't change any APIs and is minor I did it anyways. If you think it doesn't belong there, leave it to the release manager to back out. This and the #19506 change should be invisible to users.
|2013-12-08 19:05:42||gregory.p.smith||set||status: open -> closed|
messages: + msg205594
stage: needs patch -> commit review
|2013-12-08 18:59:50||neologix||set||messages: + msg205593|
messages: + msg205592
|2013-12-08 15:57:49||pitrou||set||messages: + msg205573|
|2013-12-08 12:22:58||neologix||set||messages: + msg205552|
|2013-12-08 11:03:56||haypo||set||messages: + msg205541|
|2013-12-08 11:02:32||haypo||set||messages: + msg205540|