classification
Title: Popen uses 333 times as much CPU as a shell pipe on Mac OS X
Type: behavior Stage:
Components: Documentation Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, gregory.p.smith, hughsw, pitrou, skip.montanaro
Priority: normal Keywords:

Created on 2010-06-01 22:18 by hughsw, last changed 2010-08-04 21:58 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
bytestream hughsw, 2010-06-01 22:18 script that writes bytes to stdout at a controlled rate
pull0 hughsw, 2010-06-01 22:19 script that reads its stdin and logs some stats
pull1 hughsw, 2010-06-01 22:20 script that uses Popen and reads from proc.stdout and logs some stats
Messages (6)
msg106862 - (view) Author: Hugh Secker-Walker (hughsw) Date: 2010-06-01 22:18
Popen seems to be *very* costly, CPU wise, on Mac OS X.

I need to read and process bytes from the stdout of another process that outputs bytes in a rate-limited fashion, e.g. 80000 bytes per second.  If I use a shell pipeline and a simple Python script to read the bytes and log some details then top and time both agree that the Python script is using roughly 1 second of CPU time for every 2000 seconds of elapsed time.

If I change the script to use Popen to start the first process and to read from proc.stdout, then top and time both agree that the Python script uses roughly 1 second of CPU time for every 6 seconds of elapsed time.  That is, using Popen uses 333 times as much CPU as using a shell pipeline.

In practice, this means that using a shell pipeline has no measureable impact on system performance, and I can run 300 concurrent pipeline jobs, whereas using Popen limits me to running 6 such concurrent jobs.

This is on an up-to-date Mac OS X 10.5.8 system.  The behavior happens with both the stock Python 2.5 from Apple and with Python 2.6 from Macports.

On Ubuntu Linux the Popen solution is perhaps 2% slower than using a shell pipeline.

I'll attach three scripts:
bytestream -- simple program that sends bytes to its stdout
pull0 -- Python script that reads its stdin and logs a couple of stats
pull1 -- Python script that uses Popen to start bytestream and pulls from proc.stdout and runs abysmally slowly.

Use this with a concurrent top to see the shell pipeline behavior:
$ ./bytestream 80000 4 | time ./pull0 11000

Use this with a concurrent top to see the Popen behavior:
$ time ./pull1 ./bytestream 80000 4 11000

In both cases you'll see that system and user time are approximately equal, but that they're roughly 333 times higher for pull1.  Is there a clue in the fact that both times go up by about the same huge factor?

-Hugh
msg106879 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 06:38
Have you tried specifying a value for the bufsize argument to Popen? Either 1 for line buffering or e.g. 4096 for a decent size block buffering.
msg106892 - (view) Author: Hugh Secker-Walker (hughsw) Date: 2010-06-02 15:32
OK.  Using bufsize=4096 or bufsize=-1 eliminates the CPU-hogging problem on Mac OS X.  Thank you.

I have to say that the Popen() doc could be better.  It says: "bufsize, if given, has the same meaning as the corresponding argument to the built-in open() function!" which implies behavior like open(), which is to use the system's default buffering, generally a good thing.

I think the default of 0, meaning unbuffered, is a poor choice.  I'm guessing many many Mac users are getting a big slowdown from this default behavior (as I've now discovered elsewhere in our project!).  I request that you change the default to be the same as open(), i.e. use the system default.

Barring that semantic change, please add a note to the doc:  "Most users will want to set bufsize=4096 for pipe-like buffering, or bufsize=1 for line-based buffering."  

Thanks again for the quick response and resolution.

-Hugh
msg106894 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 15:56
> I think the default of 0, meaning unbuffered, is a poor choice.

So do I, but I'm not sure whether it's acceptable to change behaviour in
a potentially incompatible way.

> Barring that semantic change, please add a note to the doc:  "Most
> users will want to set bufsize=4096 for pipe-like buffering, or
> bufsize=1 for line-based buffering."  

Yes, it deserves a documentation note indeed.
msg106896 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2010-06-02 16:51
This problem has come up before.  It was a bug introduced I think when
the subprocess module was first incorporated into Python.  I don't
recall if the default was changed in 2.7 or 3.x.  Can you demonstrate
the problem with Python 2.7 or 3.1?  If not, I'd recommend this 
ticket just be closed.
msg106898 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-06-02 17:09
> This problem has come up before.  It was a bug introduced I think when
> the subprocess module was first incorporated into Python.  I don't
> recall if the default was changed in 2.7 or 3.x.

According to the docs it is still unbuffered by default. I've added a
documentation note about performance issues in r81652.
History
Date User Action Args
2010-08-04 21:58:37terry.reedysetstatus: open -> closed

nosy: + docs@python
versions: + Python 2.7, - Python 2.6, Python 2.5
assignee: docs@python
components: + Documentation, - Library (Lib)
resolution: fixed
2010-06-02 17:09:18pitrousetmessages: + msg106898
2010-06-02 16:51:20skip.montanarosetnosy: + skip.montanaro
messages: + msg106896
2010-06-02 15:56:41pitrousetmessages: + msg106894
2010-06-02 15:32:28hughswsetmessages: + msg106892
2010-06-02 06:38:49pitrousetnosy: + gregory.p.smith, pitrou
messages: + msg106879
2010-06-01 22:20:11hughswsetfiles: + pull1
2010-06-01 22:19:32hughswsetfiles: + pull0
2010-06-01 22:18:45hughswcreate