Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.Popen leaks file descriptors on os.fork() failure #60531

Closed
MarkGius mannequin opened this issue Oct 25, 2012 · 26 comments
Closed

subprocess.Popen leaks file descriptors on os.fork() failure #60531

MarkGius mannequin opened this issue Oct 25, 2012 · 26 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@MarkGius
Copy link
Mannequin

MarkGius mannequin commented Oct 25, 2012

BPO 16327
Nosy @gpshead, @jcea
Files
  • fd_leak_fix.diff: Naive fix for bug
  • fd_leak_fix_v2.diff: Less buggy naive fix
  • fix_16327_on_3.3.diff: Fix 16327 on 3.3
  • fix_16327_on_2.7.diff: Fix 16327 on 2.7
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gpshead'
    closed_at = <Date 2012-11-11.06:50:51.168>
    created_at = <Date 2012-10-25.22:31:03.245>
    labels = ['library', 'performance']
    title = 'subprocess.Popen leaks file descriptors on os.fork() failure'
    updated_at = <Date 2012-11-11.10:02:09.989>
    user = 'https://bugs.python.org/MarkGius'

    bugs.python.org fields:

    activity = <Date 2012-11-11.10:02:09.989>
    actor = 'python-dev'
    assignee = 'gregory.p.smith'
    closed = True
    closed_date = <Date 2012-11-11.06:50:51.168>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2012-10-25.22:31:03.245>
    creator = 'Mark.Gius'
    dependencies = []
    files = ['27719', '27720', '27780', '27781']
    hgrepos = []
    issue_num = 16327
    keywords = ['patch']
    message_count = 26.0
    messages = ['173804', '173807', '173808', '173810', '173811', '173813', '173814', '173815', '173816', '173817', '173818', '173819', '173821', '173825', '173832', '173837', '174085', '174098', '174137', '174138', '174139', '174940', '175324', '175326', '175327', '175339']
    nosy_count = 5.0
    nosy_names = ['gregory.p.smith', 'jcea', 'neologix', 'python-dev', 'Mark.Gius']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue16327'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3', 'Python 3.4']

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 25, 2012

    When subprocess.Popen is called with subprocess.PIPE and os.fork() fails (due to insufficient memory for example), the pipes created by _get_handles() are not closed.

    Steps to reproduce:

    1. Launch a python shell, import subprocess
    2. Create a situation where os.fork() will fail (I ran a program to eat up nearly all of the memory on a system)
    3. subprocess.Popen('/bin/echo', stdin=subprocess.PIPE)
    4. OSError(12, 'Cannot allocate memory')
    5. ls /proc/$pid/fd , There are now extra pipes.

    I tested on Ubuntu 11.10 python (2.7.2-5ubuntu1). My reading of the 2.7 and 3.3 development trees suggest that this is an issue with both of those branches, but I don't have a 3.3 installation to confirm with.

    I've attached a snippet that fixes it for my version of Python on Ubuntu. No idea what ramifications it will have for other versions/OS/etc. No automated testing included because I'm not entirely sure how to replicate this without eating up a ton of ram or doing something naughty with ulimit.

    @MarkGius MarkGius mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Oct 25, 2012
    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 25, 2012

    Just read the docs for stdin and stdout. This patch will indtroduce a bug where a provided file descriptor can be closed. This definitely shouldn't close a file descriptor that isn't created by _get_handles(). I'll update.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 25, 2012

    Patch now only closes pipe fds created by Popen

    @gpshead gpshead self-assigned this Oct 26, 2012
    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    I would catch ALL exceptions, not only "OSError".

    An easy way to test this would be to test a subclass of Popen with "_execute_child()" method overrided for always raising an exception.

    On Unix the test could just open six fds, close them taking note of the values, call this code forcing an exception, catch it, open six new fds and verify that the numbers are the same. So we verify that neither of the six fds created "inside" are leaked.

    What should we do for Windows?

    Maybe the easier and more portable approach for exception cleanup would be to do "_execute_child()" AFTER the "fdopen()" dance, so we can just do "close()" if any exception is raised.

    Also, the cleanup MUST be done ONLY if the fds were created inside the function (PIPE), not if the fd came from the caller.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 26, 2012

    python 3 already catches all exceptions and handles closing of p2cwrite, c2pread and errread here. i don't know which branch this patch is against. Regardless, it makes sense that the other fd's, if created by us, also need to be cleaned up. The code also needs to ignore exceptions from the close() call.

    http://hg.python.org/cpython/file/cbdd6852a274/Lib/subprocess.py#l811

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 26, 2012

    attachment against 2.7

    if I understand this code correctly, the fix shouldn't need to be fixed separately on windows and Linux, since the thing handled by init is just a file descriptor.

    Good idea on the testing. I'll give that a shot tomorrow. I think 3.3 will need some extra cleanup too.

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    The cleanup code in python 3 validates my idea of simplifying cleanup moving "_execute_child()" after the platform specific code.

    I wonder what "raise" will actually raise if this cleanup code catches & ignores "close()" exception :-).

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    Mark, could you consider to fill&send a contributor form agreement? http://www.python.org/psf/contrib/

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    In fact, nested exception management in python 2 and python 3 actually diverges. BEWARE: (Python 3 does the right thing, once again :-)

    """
    Python 2.7.3 (default, Apr 12 2012, 13:11:53) 
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> try :
    ...   1/0
    ... except :
    ...   try :
    ...     raise RuntimeError("TEST")
    ...   except :
    ...     pass
    ...   raise
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 5, in <module>
    RuntimeError: TEST
    """
    
    """
    Python 3.3.0 (default, Oct  2 2012, 02:07:16) 
    [GCC 4.4.3] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> try :
    ...   1/0
    ... except :
    ...   try :
    ...     raise RuntimeError("TEST")
    ...   except :
    ...     pass
    ...   raise
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
    ZeroDivisionError: division by zero
    """

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    http://stackoverflow.com/questions/8997431/is-there-any-way-to-access-nested-or-re-raised-exceptions-in-python

    """
    This is known as Exception Chaining and is suported in Python 3.

    PEP-3134: http://www.python.org/dev/peps/pep-3134/

    In Python 2, the old exception is lost when you raise a new one, unless you save it in the except block.
    """

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    Python2 management should be something like:

    """
    Python 2.7.3 (default, Apr 12 2012, 13:11:53) 
    [GCC 4.4.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> try :
    ...   1/0
    ... except BaseException as e :
    ...   try :
    ...     raise
    ...   finally :
    ...     try :
    ...       raise RuntimeError("TEST")
    ...     except :
    ...       pass
    ... 
    Traceback (most recent call last):
      File "<stdin>", line 2, in <module>
    ZeroDivisionError: integer division or modulo by zero
    """

    Sorry, UGLY.

    Idea from http://www.doughellmann.com/articles/how-tos/python-exception-handling/index.html

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    Replace "except BaseException as e :" with just "except:". It was a remnant of my tests.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Oct 26, 2012

    No automated testing included because I'm not entirely sure how to replicate this without eating up a ton of ram or doing something naughty with ulimit.

    Simply use RLIMIT_NPROC, from a subprocess:
    """
    $ cat /tmp/test.py
    import subprocess

    ps = [ ]
    
    for i in range(1024):
        p = subprocess.Popen(['sleep', '10'])
        ps.append(p)
    $ python /tmp/test.py
    Traceback (most recent call last):
      File "/tmp/test.py", line 7, in ?
        p = subprocess.Popen(['sleep', '10'])
      File "/usr/lib64/python2.4/subprocess.py", line 550, in __init__
        errread, errwrite)
      File "/usr/lib64/python2.4/subprocess.py", line 919, in _execute_child
        self.pid = os.fork()
    OSError: [Errno 11] Resource temporarily unavailable
    $ ulimit -u 1024
    """

    Not POSIX, but supported by Linux and BSD, which should be enough.

    The problem with monkey-ptching is that you don't test the real
    codepath, and it may break in a future version (especially since here
    it would be using a preivate API).

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Oct 26, 2012

    Also, I didn't check, but if the problems also occurs on execve()
    failure, then it's much simpler: simply call Popen() with an
    invalid/nonexisting executable.

    @jcea
    Copy link
    Member

    jcea commented Oct 26, 2012

    The problem with using RLIMIT is that the testsuite could be executing several tests in parallel using independent threads, for instance. You don't want to influence unrelated tests.

    Overiding private methods is ugly, but if the code evolves the test would break, and the programmer just have to update it. I think that "sometimes" we have to be "practical". There are plenty of examples of this in the testsuite, using implementation details, etc.

    @neologix
    Copy link
    Mannequin

    neologix mannequin commented Oct 26, 2012

    The problem with using RLIMIT is that the testsuite could be executing several tests in parallel using independent threads, for instance. You don't want to influence unrelated tests.

    That's why I suggested to run it in a subprocess: this is used
    frequently, e.g. for signal or deadlock tests.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 28, 2012

    Doesn't exhibit when execve fails, because by the time execve has been reached we've closed the pipes that we're supposed to close on the parent process, and the pipes that are meant to remain open on the parent process get caught by existing cleanup code. It's unfortunately got to fail somewhere in the vicinity of the fork to fake it using the actual _execute_child.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 29, 2012

    Stubbing _execute_child out for a test is easiest. No need to craft ways to
    cause an actual fork failure.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 29, 2012

    Patch fixes and tests fd leak on Python 3.3. Test fails without fix, passes with fix.

    I found an existing test looking for fd leaks for another bug. Borrowed the verification bits from it.

    There were some other test failures when I ran the subprocess suite on my laptop, but it more like I had some environmental issue rather than having genuinely broken anything. If somebody else (or the test bots?) could run the tests I would appreciate it.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 29, 2012

    Here's more or less the same fix and test on 2.7. I jumped through the hoop to preserve the original exception and traceback even if os.close() raises an exception.

    This follows the 3.3 branch's cleanup behavior of silently suppressing errors in the cleanup code.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Oct 29, 2012

    I've also submitted the contributor form requested.

    @MarkGius
    Copy link
    Mannequin Author

    MarkGius mannequin commented Nov 5, 2012

    My contributor form has been accepted. Anything else I should be doing to work towards getting a fix applied?

    @gpshead
    Copy link
    Member

    gpshead commented Nov 11, 2012

    Thanks! I'm looking into applying these tonight (including 3.2) with a couple minor edits.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 11, 2012

    New changeset 63ff4c9a2ed2 by Gregory P. Smith in branch '3.2':
    Fixes issue bpo-16327: The subprocess module no longer leaks file descriptors
    http://hg.python.org/cpython/rev/63ff4c9a2ed2

    New changeset a6a6c349af7e by Gregory P. Smith in branch '3.3':
    Fixes issue bpo-16327: The subprocess module no longer leaks file descriptors
    http://hg.python.org/cpython/rev/a6a6c349af7e

    New changeset a9e238168588 by Gregory P. Smith in branch 'default':
    Fixes issue bpo-16327: The subprocess module no longer leaks file descriptors
    http://hg.python.org/cpython/rev/a9e238168588

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 11, 2012

    New changeset e67620048d2f by Gregory P. Smith in branch '2.7':
    Fixes issue bpo-16327: The subprocess module no longer leaks file descriptors
    http://hg.python.org/cpython/rev/e67620048d2f

    @gpshead gpshead closed this as completed Nov 11, 2012
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 11, 2012

    New changeset 2bdd984a55ac by Gregory P. Smith in branch '2.7':
    Fix issue bpo-16140 bug that the fix to issue bpo-16327 added - don't double
    http://hg.python.org/cpython/rev/2bdd984a55ac

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants