classification
Title: subprocess: increase read buffer size
Type: performance Stage: commit review
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, haypo, neologix, pitrou, python-dev, sbt, serhiy.storchaka
Priority: normal Keywords:

Created on 2013-12-08 09:38 by neologix, last changed 2013-12-08 19:05 by gregory.p.smith. This issue is now closed.

Files
File name Uploaded Description Edit
test_sub_read.py neologix, 2013-12-08 09:38
Messages (8)
msg205534 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-08 09:38
This is a spinoff of issue #19506: currently, subprocess.communicate() uses a 4K buffer when reading data from pipes.
This was probably optimal a couple years ago, but nowadays most operating systems have larger pipes (e.g. Linux has 64K), so we might be able to gain some performance by increasing this buffer size.

For example, here's a benchmark reading from a subprocess spawning "dd if=/dev/zero bs=1M count=100":

# before, 4K buffer
$ ./python ~/test_sub_read.py 
2.72450800300021

# after, 64K buffer
$ ./python ~/test_sub_read.py 
1.2509000449999803

The difference is impressive.

I'm attaching the benchmark script so that others can experiment a bit (on multi-core machines and also different OSes).
msg205540 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-08 11:02
Where is the buffer size? The hardcoded 4096 value in Popen._communicate()?
   data = os.read(key.fd, 4096)

I remember that I asked you where does 4096 come from when you patched subprocess to use selectors (#18923):
http://bugs.python.org/review/18923/#ps9827

Python 3.3 uses 1024 for its select.select() implementation, 4096 for its select.poll() implementation.

Since Popen.communicate() returns the whole content of the buffer, would it be safe to increase the buffer size? For example, use 4 GB as the buffer size? Should communicate() be "fair" between stdout and stderr?

To choose a new value, we need benchmark results on different OSes, at least Windows, Linux, FreeBSD (and maybe also Solaris) since these 3 OSes have a different kernel.
msg205541 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-12-08 11:03
Oh, Charles-François Natali replied to my review by email, and it's not archived on Rietveld. Copy of him message:

> http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode420
> Lib/subprocess.py:420: _PopenSelector = selectors.SelectSelector
> This code should be factorized, but I opened a new issue for that:
> #19465.
> http://bugs.python.org/issue19465

OK, so we'll see in the other issue :-)

> http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode1583
> Lib/subprocess.py:1583: self._input_offset + _PIPE_BUF]
> (Unrelated to the selector issue) It may be more efficient to use a
> memory view instead of a slice here.

I also noticed, but that's also a different issue.

> http://bugs.python.org/review/18923/diff/9757/Lib/subprocess.py#newcode1597
> Lib/subprocess.py:1597: data = os.read(key.fd, 4096)
> In the old code, poll() uses 4096 whereas select() uses 1024 bytes... Do
> you know the reason? Why 4096 and not 1 byte or 1 MB? I would prefer a
> global constant rather than an harded limit.

When writing to the pipe, we should definitely write less than
PIPE_BUF, to avoid blocking after select()/poll() returns write-ready.
When reading, it's not so important, in the sense that any value will work.
The only difference will be a difference speed.

Logically, I would say that the ideal size is the pipe buffer size,
which varies between 4K and 64K: and indeed, I wrote a quick
benchmark, and a 64K value yields the best result.

> Use maybe os.stat(fd).st_blksize?

IMO that's not worth it, also, IIRC, some OS don't set it for pipes.

> Modules/_io/fileio.c uses SMALLCHUNK to read the whole content of a
> file:
>
> #if BUFSIZ < (8*1024)
> #define SMALLCHUNK (8*1024)
> #elif (BUFSIZ >= (2 << 25))
> #error "unreasonable BUFSIZ > 64MB defined"
> #else
> #define SMALLCHUNK BUFSIZ
> #endif
>
> The io module uses a default buffer size of 8 KB to read.

Well, that's just a heuristic: the ideal size depends on the OS, the
filesystem, backing device, etc (for an on-disk file)?

Anyway, this too belongs to another issue.

Please try to only make comments relevant to the issue at hand!

> http://bugs.python.org/review/18923/diff/9757/Lib/test/test_subprocess.py
> File Lib/test/test_subprocess.py (left):
>
> http://bugs.python.org/review/18923/diff/9757/Lib/test/test_subprocess.py#oldcode2191
> Lib/test/test_subprocess.py:2191: class
> ProcessTestCaseNoPoll(ProcessTestCase):
> Oh oh, removing a test is not a good idea. You should test select and
> poll selectors when poll is used by default.

Well, those distinct test cases made sense before, because there were
two different code paths depending on whether we were using select()
or poll(). Now, the code path is exactly the same, and the different
selectors implementations have their own tests, so IMO that's not
necessary anymore.

But I re-enabled them anyway...
msg205552 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-08 12:22
> STINNER Victor added the comment:
>
> Since Popen.communicate() returns the whole content of the buffer, would it be safe to increase the buffer size? For example, use 4 GB as the buffer size?

Sure, if you want to pay the CPU and memory overhead of allocating a
4GB buffer :-)

> Should communicate() be "fair" between stdout and stderr?

There's no reason to make the buffer depend on the FD.

> To choose a new value, we need benchmark results on different OSes, at least Windows, Linux, FreeBSD (and maybe also Solaris) since these 3 OSes have a different kernel.

Windows isn't concerned by this, since it doesn't use a selector, but threads.
For the other OSes, that's why I opened this issue (you forgot AIX in
your list :-).
msg205573 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-08 15:57
Linux, 64-bit quad core:

With 4K buffer:

$ time ./python test_sub_read.py 
0.25217683400114765

real	0m0.296s
user	0m0.172s
sys	0m0.183s

With 64K buffer:

$ time ./python test_sub_read.py 
0.09257541999977548

real	0m0.132s
user	0m0.051s
sys	0m0.096s
msg205592 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-08 18:58
New changeset 03a056c3b88e by Gregory P. Smith in branch '3.3':
Fixes issue #19929: Call os.read with 32768 within subprocess.Popen
http://hg.python.org/cpython/rev/03a056c3b88e

New changeset 4de4b5a4e405 by Gregory P. Smith in branch 'default':
Fixes issue #19929: Call os.read with 32768 within subprocess.Popen
http://hg.python.org/cpython/rev/4de4b5a4e405
msg205593 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-12-08 18:59
> Roundup Robot added the comment:
>
> New changeset 03a056c3b88e by Gregory P. Smith in branch '3.3':
> Fixes issue #19929: Call os.read with 32768 within subprocess.Popen
> http://hg.python.org/cpython/rev/03a056c3b88e

Not that it bothers me, but AFAICT this isn't a bugfix, and as such
shouldn't be backported to 3.3, no?
msg205594 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2013-12-08 19:05
I saw a small regression over 4k when using a 64k buffer on one of my machines (dual core amd64 linux).  With 32k everything (amd64 linux, armv7l 32-bit linux, 64-bit os x 10.6) showed a dramatic improvement on the microbenchmark.  approaching 50% less cpu use in many cases.

i doubt applications will notice as much as they're likely to be dominated by their own application code rather than the subprocess internals.

re: 3.3 or not, true, but since it doesn't change any APIs and is minor I did it anyways.  If you think it doesn't belong there, leave it to the release manager to back out.  This and the #19506 change should be invisible to users.
History
Date User Action Args
2013-12-08 19:05:42gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg205594

stage: needs patch -> commit review
2013-12-08 18:59:50neologixsetmessages: + msg205593
2013-12-08 18:58:39python-devsetnosy: + python-dev
messages: + msg205592
2013-12-08 15:57:49pitrousetmessages: + msg205573
2013-12-08 12:22:58neologixsetmessages: + msg205552
2013-12-08 11:03:56hayposetmessages: + msg205541
2013-12-08 11:02:32hayposetmessages: + msg205540
2013-12-08 09:38:10neologixcreate