classification
Title: Allow to set pipe size on subprocess.Popen.
Type: enhancement Stage: commit review
Components: Library (Lib) Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: gregory.p.smith, rhpvorderman
Priority: normal Keywords: patch

Created on 2020-08-19 05:46 by rhpvorderman, last changed 2020-10-21 06:04 by gregory.p.smith. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 21921 merged rhpvorderman, 2020-08-19 07:34
PR 22839 merged gregory.p.smith, 2020-10-20 23:18
Messages (5)
msg375636 - (view) Author: Ruben Vorderman (rhpvorderman) * Date: 2020-08-19 05:46
Pipes block if reading from an empty pipe or when writing to a full pipe. When this happens the program waiting for the pipe still uses a lot of CPU cycles when waiting for the pipe to stop blocking.

I found this while working with xopen. A library that pipes data into an external gzip process. (This is more efficient than using python's gzip module, because the subprocess escapes the GIL, so your main algorithm can fully utilize one CPU core while the compression is offloaded to another).

It turns out that increasing the pipe size on Linux from the default of 64KB to the maximum allowed pipe size in /proc/sys/fs/max-pipe-size (1024KB) drastically improves performance: https://github.com/marcelm/xopen/issues/35. TLDR: full utilization of CPU cores, a 40%+ decrease in wall-clock time and a 20% decrease in the number of compute seconds (indicating that 20% was wasted waiting on blocking pipes).

However, doing this with subprocess is quite involved as it is now.

1. You have to find out which constants to use in fcntl for setting the pipesize (these constants are not in python). 
2. You have to start the Popen process with routing stdout to subprocess.Pipe. 
3. You have to get my_popen_process.stdout.fileno() 
4. Use fcntl.fcntl to modify the pipe size.

It would be much easier to do `subprocess.Popen(args, pipesize=1024 *1024)` for example.

I am currently working on a PR implementing this. It will also make F_GETPIPE_SZ and F_SETPIPE_SZ available to the fcntl module.
msg379059 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-19 23:30
New changeset 23c0fb8edd16fe6d796df2853a5369fd783e05b7 by Ruben Vorderman in branch 'master':
bpo-41586: Add pipesize parameter to subprocess & F_GETPIPE_SZ and F_SETPIPE_SZ to fcntl. (GH-21921)
https://github.com/python/cpython/commit/23c0fb8edd16fe6d796df2853a5369fd783e05b7
msg379060 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-19 23:30
Thanks Ruben!
msg379148 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-20 17:40
this caused a variety of buildbot failures.  investigating.
msg379176 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-10-21 00:37
New changeset 786addd9d07b6c712b8ea9ee06e1f9f41c1b67a1 by Gregory P. Smith in branch 'master':
bpo-41586: Attempt to make the pipesize tests more robust. (GH-22839)
https://github.com/python/cpython/commit/786addd9d07b6c712b8ea9ee06e1f9f41c1b67a1
History
Date User Action Args
2020-10-21 06:04:04gregory.p.smithsetstatus: open -> closed
resolution: fixed
stage: patch review -> commit review
2020-10-21 00:37:33gregory.p.smithsetmessages: + msg379176
2020-10-20 23:18:26gregory.p.smithsetstage: commit review -> patch review
pull_requests: + pull_request21793
2020-10-20 17:40:39gregory.p.smithsetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg379148
2020-10-19 23:30:40gregory.p.smithsetstatus: open -> closed
type: enhancement
messages: + msg379060

resolution: fixed
stage: patch review -> commit review
2020-10-19 23:30:12gregory.p.smithsetmessages: + msg379059
2020-08-21 06:25:39gregory.p.smithsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2020-08-19 07:34:18rhpvordermansetkeywords: + patch
stage: patch review
pull_requests: + pull_request21035
2020-08-19 05:46:00rhpvordermancreate