classification
Title: subprocess.Popen reads errpipe_read incorrectly, can result in short read
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, pitrou, vitaly
Priority: normal Keywords:

Created on 2012-09-11 09:06 by vitaly, last changed 2020-05-31 12:24 by serhiy.storchaka. This issue is now closed.

Messages (9)
msg170279 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 09:06
subprocess.Popen (at least on 2.6.7) reads the pipe incorrectly: It doesn't loop to read all the data until EOF -- it only loops over EINTR until it gets a single successful os.read() call.  However, since this is a pipe read (not a real file read), the system doesn't guarantee that the blocking read will read everything up to requested read size or EOF, whichever comes first.  So, the single os.read call could return a partial read, and the subsequent un-pickling of the exception would fail/crash.

Sorry, I can't submit a patch as I am merely a Python user, not a Python developer, and it would take me too long to set up and figure out Python build just for this one issue.
msg170317 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 15:56
Also, attempting to read a 1MB chunk of data in a single os.read call forces os.read() to unnecessarily allocate a huge !MB buffer (even if only for a short lifetime).  Using something like 4KB read calls in a loop is more practical (and looping is necessary anyway).
msg170318 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-11 15:57
2.6 doesn't receive bug fixes anymore. Can you at least test with a recent 2.7?
msg170328 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-09-11 17:11
fyi - i suspect Python 3.2 and the backport of that to 2.x http://code.google.com/p/python-subprocess32/ do not have this issue.

but you didn't give enough information in the bug report for me to know which pipe and which read call you're talking about to really be sure.  (show tracebacks, mention specific version source lines, etc..).  If the bug is present in 2.7, we'll consider a fix for it.

regardless, I recommend *everyone* using Python 2.x use the subprocess32 module rather than the built in subprocess module.
msg170333 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 17:52
My bug report refers to the following code block in subprocess.py. The problem is the same in 2.6.7 and 2.7.3:

=== From subprocess.py in Python 2.7.3 Source Distribution:
                # Wait for exec to fail or succeed; possibly raising exception
                # Exception limited to 1M
                data = _eintr_retry_call(os.read, errpipe_read, 1048576)
            finally:
                # be sure the FD is closed no matter what
                os.close(errpipe_read)
===


However, Python 3.2.3 appears to be doing this correctly; it loops, gathers the data from multiple os.read calls, and doesn't make huge (1048576) os.read requests:

=== from subprocess.py in 3.2.3 source distribution
                # Wait for exec to fail or succeed; possibly raising an
                # exception (limited in size)
                data = bytearray()
                while True:
                    part = _eintr_retry_call(os.read, errpipe_read, 50000)
                    data += part
                    if not part or len(data) > 50000:
                        break

===
msg170335 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 17:58
Sorry, there is no traceback.  The issue was identified in code review.

'man 2 read' states:

===
The system guaran-
     tees to read the number of bytes requested if the descriptor references a
     normal file that has that many bytes left before the end-of-file, but in
     no other case.
===

Since a pipe is not a "normal file", the guarantee to "read the number of bytes requested" does not apply.
msg170337 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 18:05
The prior 'man 2 read' quote was from Mac OS X;

On amazon (centos) Linux, 'man 2 read' makes the same claim, albeit in different verbiage:

===
It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to  end-of-file,  or  because  we  are  reading  from a pipe, or from a terminal), or because read() was interrupted by a signal.
===
msg170338 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-09-11 18:14
Yeah, sounds like _eintr_retry_call alone isn't appropriate here in 2.7.
I'll fix it.

In practice I doubt this matters much as this error string is likely to be
less than one page (depends on pathnames involved) but it is still
technically incorrect as written.
msg170341 - (view) Author: Vitaly (vitaly) Date: 2012-09-11 18:55
> In practice I doubt this matters much as this error string is likely to be less than one page (depends on pathnames involved) but it is still
technically incorrect as written.

Agreed.  Thank you.
History
Date User Action Args
2020-05-31 12:24:15serhiy.storchakasetstatus: open -> closed
resolution: out of date
stage: resolved
2012-09-11 18:55:26vitalysetmessages: + msg170341
2012-09-11 18:14:23gregory.p.smithsetmessages: + msg170338
2012-09-11 18:05:10vitalysetmessages: + msg170337
2012-09-11 17:58:49vitalysetmessages: + msg170335
2012-09-11 17:52:48vitalysetmessages: + msg170333
versions: + Python 2.7
2012-09-11 17:11:27gregory.p.smithsetmessages: + msg170328
2012-09-11 15:57:26pitrousetnosy: + gregory.p.smith, pitrou
messages: + msg170318
2012-09-11 15:56:08vitalysetmessages: + msg170317
2012-09-11 09:06:17vitalycreate