Message 106862 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	hughsw
Recipients	hughsw
Date	2010-06-01.22:18:44
SpamBayes Score	3.7779655e-06
Marked as misclassified	No
Message-id	<1275430726.49.0.216093303643.issue8873@psf.upfronthosting.co.za>
In-reply-to

Content
Popen seems to be very costly, CPU wise, on Mac OS X. I need to read and process bytes from the stdout of another process that outputs bytes in a rate-limited fashion, e.g. 80000 bytes per second. If I use a shell pipeline and a simple Python script to read the bytes and log some details then top and time both agree that the Python script is using roughly 1 second of CPU time for every 2000 seconds of elapsed time. If I change the script to use Popen to start the first process and to read from proc.stdout, then top and time both agree that the Python script uses roughly 1 second of CPU time for every 6 seconds of elapsed time. That is, using Popen uses 333 times as much CPU as using a shell pipeline. In practice, this means that using a shell pipeline has no measureable impact on system performance, and I can run 300 concurrent pipeline jobs, whereas using Popen limits me to running 6 such concurrent jobs. This is on an up-to-date Mac OS X 10.5.8 system. The behavior happens with both the stock Python 2.5 from Apple and with Python 2.6 from Macports. On Ubuntu Linux the Popen solution is perhaps 2% slower than using a shell pipeline. I'll attach three scripts: bytestream -- simple program that sends bytes to its stdout pull0 -- Python script that reads its stdin and logs a couple of stats pull1 -- Python script that uses Popen to start bytestream and pulls from proc.stdout and runs abysmally slowly. Use this with a concurrent top to see the shell pipeline behavior: $ ./bytestream 80000 4 \| time ./pull0 11000 Use this with a concurrent top to see the Popen behavior: $ time ./pull1 ./bytestream 80000 4 11000 In both cases you'll see that system and user time are approximately equal, but that they're roughly 333 times higher for pull1. Is there a clue in the fact that both times go up by about the same huge factor? -Hugh

Popen seems to be *very* costly, CPU wise, on Mac OS X.

I need to read and process bytes from the stdout of another process that outputs bytes in a rate-limited fashion, e.g. 80000 bytes per second. If I use a shell pipeline and a simple Python script to read the bytes and log some details then top and time both agree that the Python script is using roughly 1 second of CPU time for every 2000 seconds of elapsed time.

If I change the script to use Popen to start the first process and to read from proc.stdout, then top and time both agree that the Python script uses roughly 1 second of CPU time for every 6 seconds of elapsed time. That is, using Popen uses 333 times as much CPU as using a shell pipeline.

In practice, this means that using a shell pipeline has no measureable impact on system performance, and I can run 300 concurrent pipeline jobs, whereas using Popen limits me to running 6 such concurrent jobs.

This is on an up-to-date Mac OS X 10.5.8 system. The behavior happens with both the stock Python 2.5 from Apple and with Python 2.6 from Macports.

On Ubuntu Linux the Popen solution is perhaps 2% slower than using a shell pipeline.

I'll attach three scripts:
bytestream -- simple program that sends bytes to its stdout
pull0 -- Python script that reads its stdin and logs a couple of stats
pull1 -- Python script that uses Popen to start bytestream and pulls from proc.stdout and runs abysmally slowly.

Use this with a concurrent top to see the shell pipeline behavior:
$ ./bytestream 80000 4 | time ./pull0 11000

Use this with a concurrent top to see the Popen behavior:
$ time ./pull1 ./bytestream 80000 4 11000

In both cases you'll see that system and user time are approximately equal, but that they're roughly 333 times higher for pull1. Is there a clue in the fact that both times go up by about the same huge factor?

-Hugh

History
Date	User	Action	Args
2010-06-01 22:18:46	hughsw	set	recipients: + hughsw
2010-06-01 22:18:46	hughsw	set	messageid: <1275430726.49.0.216093303643.issue8873@psf.upfronthosting.co.za>
2010-06-01 22:18:44	hughsw	link	issue8873 messages
2010-06-01 22:18:44	hughsw	create