Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess is not EINTR-safe #41185

Closed
astrand mannequin opened this issue Nov 17, 2004 · 29 comments
Closed

subprocess is not EINTR-safe #41185

astrand mannequin opened this issue Nov 17, 2004 · 29 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@astrand
Copy link
Mannequin

astrand mannequin commented Nov 17, 2004

BPO 1068268
Nosy @gpshead, @amauryfa, @nirs, @naufraghi, @bitdancer, @rnk, @davidmalcolm
Files
  • nointr.patch: Patch for subprocess.py and test_subprocess.py
  • no-EINTR-subprocess.py-25-maint-r65475.patch: no EINTR patch upgraded to 25-maint r65475
  • trunk-diff-unified.txt: patch against trunk
  • bugtest.py: Reproduces fault
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2010-03-01.00:44:25.834>
    created_at = <Date 2004-11-17.21:07:57.000>
    labels = ['type-bug', 'library']
    title = 'subprocess is not EINTR-safe'
    updated_at = <Date 2010-03-01.00:44:37.753>
    user = 'https://bugs.python.org/astrand'

    bugs.python.org fields:

    activity = <Date 2010-03-01.00:44:37.753>
    actor = 'gregory.p.smith'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2010-03-01.00:44:25.834>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2004-11-17.21:07:57.000>
    creator = 'astrand'
    dependencies = []
    files = ['1489', '12438', '14700', '15869']
    hgrepos = []
    issue_num = 1068268
    keywords = ['patch']
    message_count = 29.0
    messages = ['23177', '23178', '23179', '23180', '23181', '23182', '23183', '57055', '64821', '64832', '69397', '70433', '72593', '72596', '72709', '73336', '74916', '74924', '78033', '78103', '78246', '78253', '78254', '78259', '79391', '91502', '97756', '100228', '100230']
    nosy_count = 16.0
    nosy_names = ['gregory.p.smith', 'astrand', 'amaury.forgeotdarc', 'nirs', 'mattjohnston', 'mpitt', 'timjr', 'jyasskin', 'schmir', 'jnoller', 'naufraghi', 'r.david.murray', 'rnk', 'cmiller', 'dmalcolm', 'mathmodave']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1068268'
    versions = ['Python 2.6', 'Python 3.1', 'Python 2.7', 'Python 3.2']

    @astrand
    Copy link
    Mannequin Author

    astrand mannequin commented Nov 17, 2004

    The subprocess module is not safe for use with signals,
    because it doesn't retry the system calls upon EINTR.
    However, as far as I understand it, this is true for
    most other Python modules as well, so it isn't obvious
    that the subprocess needs to be fixed.

    The problem was first noticed by John P Speno.

    @astrand astrand mannequin self-assigned this Nov 17, 2004
    @astrand astrand mannequin added the stdlib Python modules in the Lib dir label Nov 17, 2004
    @astrand astrand mannequin self-assigned this Nov 17, 2004
    @astrand astrand mannequin added the stdlib Python modules in the Lib dir label Nov 17, 2004
    @astrand
    Copy link
    Mannequin Author

    astrand mannequin commented Nov 17, 2004

    Logged In: YES
    user_id=344921

    One way of testing subprocess for signal-safeness is to
    insert these lines just after _cleanup():

    import signal
    signal.signal(signal.SIGALRM, lambda x,y: 1)
    signal.alarm(1)
    import time
    time.sleep(0.99)

    Then run test_subprocess.py.

    @mattjohnston
    Copy link
    Mannequin

    mattjohnston mannequin commented Dec 22, 2004

    Logged In: YES
    user_id=785805

    I've hit this on a Solaris 9 box without explicitly using
    signals. Using the DCOracle module, a seperate Oracle
    process is executed. When this terminates, a SIGCHLD is sent
    to the calling python process, which may be in the middle of
    a select() in the communicate() call, causing EINTR. From
    the output of truss (like strace), a sigchld handler doesn't
    appear to be getting explicitly installed by the Oracle module.

    SunOS 5.9 Generic_112233-01 sun4u sparc SUNW,Sun-Fire-280R

    @mpitt
    Copy link
    Mannequin

    mpitt mannequin commented Feb 26, 2007

    I just got two different Ubuntu bug reports about this problem as well, and I'm unsure how to circumvent this at the application level.

    http://librarian.launchpad.net/6514580/Traceback.txt
    http://librarian.launchpad.net/6527195/Traceback.txt

    (from https://launchpad.net/bugs/87292 and its duplicate)

    @mpitt
    Copy link
    Mannequin

    mpitt mannequin commented Mar 14, 2007

    I updated Peter's original patch to 2.5+svn fixes and added proper tests to test_subprocess.py. It works great now.

    What do you think about this approach? Fixing it only in submodule feels a bit strange, but then again, this is meant to be an easy to use abstraction, and most of the people that were hit by this (according to Google) encountered the problem in subprocess.

    I don't see how to attach something here, so I attached the updated patch to the Ubuntu bug (https://launchpad.net/bugs/87292):

    http://librarian.launchpad.net/6807594/subprocess-eintr-safety.patch

    Thanks,

    Martin

    @mpitt
    Copy link
    Mannequin

    mpitt mannequin commented Mar 15, 2007

    Updated patch: http://librarian.launchpad.net/6820347/subprocess-eintr-safety.patch

    I removed the _read_all() function, since that broke the 1 MB exception limit and does not seem to be necessary in the first place.

    @timjr
    Copy link
    Mannequin

    timjr mannequin commented Jul 27, 2007

    I hit this in Python 2.5.1 on an Intel Mac in a PyQt application. mpitt's patch at http://librarian.launchpad.net/6820347/subprocess-eintr-safety.patch fixed it for me.

    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Nov 2, 2007

    In normal application code that signal.alarm is called for a reason. And
    the caller most probably expects it to interrupt the read calls. So I
    think the proposed patch is wrong.

    What was the signal interrupting the read calls? maybe SIGPIPE?

    @schmir
    Copy link
    Mannequin

    schmir mannequin commented Apr 1, 2008

    Of course the signal handler may raise an exception, so my last argument
    isn't that good.

    @jyasskin
    Copy link
    Mannequin

    jyasskin mannequin commented Apr 2, 2008

    I think the proper behavior on EINTR may depend on which subprocess call
    we're in. For example, the user can easily loop on .wait() herself if
    she wants to ignore EINTR. But it's a lot harder to loop on Popen() if
    the read() in _execute_child is interrupted. So my inclination would be
    to let EINTR be raised in the first case, and use a loop to handle it in
    the second.

    Regarding the patch, a wrapper function called as
    "retry_on_eintr(obj.write, data)" might be a cleaner way to do it.

    @gpshead
    Copy link
    Member

    gpshead commented Jul 7, 2008

    fyi - To fix issue bpo-2113 I added handling of a select.error errno.EINTR
    being raised during the select.select call in r64756.

    @jnoller
    Copy link
    Mannequin

    jnoller mannequin commented Jul 30, 2008

    I think this should be resolved in 2.6/3.0 if possible. Especially if
    distributions like Ubuntu are self-patching the fix into place. For
    reference, see: http://mg.pov.lt/blog/subprocess-in-2.4

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Sep 5, 2008

    I'd like to suggest to rise the priority of this bug.
    Till this bus is around, no way using any module using subprocess.Popen
    form a PyQt app (and I suppose PyGtk and wxPython too).

    @amauryfa
    Copy link
    Member

    amauryfa commented Sep 5, 2008

    Two remarks:
    1 - The part of the patch around the call to select.select() is already
    in trunk since r64756, almost in the same form. good.

    2 - the patch seems to replace all calls to os.write, os.read and
    os.waipid. But it is based on a very old version of subprocess, and
    r38169 added a new call to os.waitpid. I don't know if it should be
    replaced as well.

    @gpshead
    Copy link
    Member

    gpshead commented Sep 6, 2008

    its too late in the release process for subprocess internals being EINTR
    safe to make it into 2.6 but it is reasonable for 2.6.1.

    @gpshead gpshead added type-bug An unexpected behavior, bug, or error labels Sep 6, 2008
    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Sep 17, 2008

    Upgrade subprocess.py patch to 25-maint r65475
    (apply cleanly with http://bugs.python.org/issue2113 fixed)

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Oct 17, 2008

    Factorized try-except code, merged r65475 from 25-maint.
    Protetect std[in|out|err] read and write too.

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Oct 17, 2008

    Ups, forgot a _no_intr around select.select

    Index: subprocess.py
    ===================================================================

    --- subprocess.py	(revisione 19645)
    +++ subprocess.py	(copia locale)
    @@ -1178,7 +1178,7 @@
     
                 input_offset = 0
                 while read_set or write_set:
    -                rlist, wlist, xlist = select.select(read_set, 
    write_set, [])
    +                rlist, wlist, xlist = _no_intr(select.select)(read_set, 
    write_set, [])
     
                     if self.stdin in wlist:
                         # When select has indicated that the file is 
    writable,

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Dec 18, 2008

    Python 2.5.3 is near but the I think the fix in
    http://svn.python.org/view?rev=65475&view=rev
    is not enough, there are a lot of other places where EINTR can cause and
    error.

    @vstinner
    Copy link
    Member

    naufraghi> there are a lot of other places where EINTR
    naufraghi> can cause and error.

    Can you write a list of these places?

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Dec 23, 2008

    Please have a look at the proposed patch:

    http://bugs.python.org/file11511/subprocess-eintr-safety-25maint-
    r65475.patch

    the list is more or less the patch itself.

    @vstinner
    Copy link
    Member

    Instead of define a method for each "syscall", you can write a generic
    function that retry at OSError(EINTR):

    def no_intr(self, func, *args, **kw):
      while True:
        try:
          return func(*args, **kw)
        except OSError, e:
          if e.errno == errno.EINTR:
            continue
          else:
            raise
    
    x=_waitpid_no_intr(pid, options) becomes x=no_intr(os.waitpid, pid, 
    options).

    @vstinner
    Copy link
    Member

    Oh, the new patch (subprocess-retry-on-EINTR-std-in-out-err.diff) has
    already a generic function (_no_intr) which is a little bit different
    than mine (looks like a decorator). Can you delete your old patch?

    @naufraghi
    Copy link
    Mannequin

    naufraghi mannequin commented Dec 24, 2008

    no EINTR patch upgraded to 25-maint r65475 that protects:

    *) all direct calls
    *) all returned fd

    I hope :P

    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2009

    Since Python 2.5 only accept security fixes, you should update your
    patch to Python trunk.

    @cmiller
    Copy link
    Mannequin

    cmiller mannequin commented Aug 12, 2009

    File
    "/home/cmiller/work/cabzr/desktopcouch/getport-at-call-time/desktopcouch/start_local_couchdb.py",
    line 93, in run_couchdb
    retcode = subprocess.call(local_exec)
    File "/usr/lib/python2.6/subprocess.py", line 444, in call
    return Popen(*popenargs, **kwargs).wait()
    File "/usr/lib/python2.6/subprocess.py", line 1123, in wait
    pid, sts = os.waitpid(self.pid, 0)
    exceptions.OSError: [Errno 4] Interrupted system call

    Now what? The process started, but I have no way of knowing when it
    finishes or the exit value when it does, because I don't have access to
    the Popen object. Nor can I even kill it and try again, because I can't
    get he process id.

    try/except in my code can never help. It must be put in the stdlib.

    Or, if this is too egregious, then the docs should scream that
    subprocess.call can never safely be used, and users should avoid it.

    File
    "/home/cmiller/work/cabzr/desktopcouch/getport-at-call-time/desktopcouch/start_local_couchdb.py",
    line 93, in run_couchdb
    retcode = subprocess.call(local_exec)
    File "/usr/lib/python2.6/subprocess.py", line 444, in call
    return Popen(*popenargs, **kwargs).wait()
    File "/usr/lib/python2.6/subprocess.py", line 595, in __init__
    errread, errwrite)
    File "/usr/lib/python2.6/subprocess.py", line 1084, in _execute_child
    data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
    exceptions.OSError: [Errno 4] Interrupted system call

    This os.read is a byproduct of something the Popen.__init__
    implementation must do, and it is safe for it to continue to get the
    information it needs, without the user's knowledge.

    The process is started, then this is aborted before the Popen.stdout and
    .stderr are set up, leaving the object in a weird state.

    @mathmodave
    Copy link
    Mannequin

    mathmodave mannequin commented Jan 14, 2010

    Another instance of a blocking function within subprocess not being protected against EINTR

    Python 2.6.4, subprocess.py, Popen function, line 1115:
    data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB

    If a signal arrives while blocked in this read, the EINTR/OSError is passed up to whatever called subprocess.Popen. Retrying the Popen doesn't help because the child process may already have started but the caller has no way to know this nor does the caller have any control over the child process.

    ===

    In the example code, the first subprocess.Popen starts without issue but while in the second Popen call, p1's SIGCHLD is received by the parent.
    p2 is never set, but the second copy of /bin/date starts running anyway.

    The "preexec_fn = lambda : time.sleep(2)" in the second Popen is a little contrived but serves to guarantee that the SIGCHLD will break the Popen for the purposes of the demonstration. I have seen this failure mode when using vanilla Popen calls although you have to be lucky/unlucky to see it.

    ====

    This is in python 2.6.4:

    md5sum subprocess.py
    2ac8cefe8301eadce87630b230d6fff2 subprocess.py

    ====

    I expect the fix is equivalent to cmiller's trunk-diff-unified.txt

    @gpshead gpshead assigned gpshead and unassigned astrand Feb 28, 2010
    @gpshead
    Copy link
    Member

    gpshead commented Mar 1, 2010

    fixed in trunk r78523. backporting to 2.6 and 3.1.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 1, 2010

    merged into 2.6 and 3.1 release branches.

    @gpshead gpshead closed this as completed Mar 1, 2010
    @gpshead gpshead closed this as completed Mar 1, 2010
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 9, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants