Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.Popen._cleanup() "The handle is invalid" error when some old process is gone #81561

Closed
efiop mannequin opened this issue Jun 23, 2019 · 36 comments
Closed

subprocess.Popen._cleanup() "The handle is invalid" error when some old process is gone #81561

efiop mannequin opened this issue Jun 23, 2019 · 36 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@efiop
Copy link
Mannequin

efiop mannequin commented Jun 23, 2019

BPO 37380
Nosy @gpshead, @pfmoore, @vstinner, @tjguk, @zware, @eryksun, @zooba, @efiop
PRs
  • bpo-37380: Popen: ignore dead procs on _cleanup #14327
  • bpo-37380: subprocess: skip _cleanup and __del__ on win #14360
  • [3.7] bpo-37380: subprocess: don't use _active on win (GH-14360) #15706
  • [3.8] bpo-37380: subprocess: don't use _active on win (GH-14360) #15707
  • Files
  • invalid_handle.py: A small reproduction that triggers the 'invalid handle' error
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-09-06.09:20:35.256>
    created_at = <Date 2019-06-23.18:42:24.606>
    labels = ['3.7', '3.8', 'OS-windows', 'type-crash', '3.9']
    title = 'subprocess.Popen._cleanup() "The handle is invalid" error when some old process is gone'
    updated_at = <Date 2020-05-29.19:00:45.238>
    user = 'https://github.com/efiop'

    bugs.python.org fields:

    activity = <Date 2020-05-29.19:00:45.238>
    actor = 'steve.dower'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-09-06.09:20:35.256>
    closer = 'vstinner'
    components = ['Windows']
    creation = <Date 2019-06-23.18:42:24.606>
    creator = 'efiop'
    dependencies = []
    files = ['48493']
    hgrepos = []
    issue_num = 37380
    keywords = ['patch']
    message_count = 36.0
    messages = ['346332', '346333', '346338', '346343', '346461', '346470', '346568', '346578', '346581', '346606', '346635', '346638', '346695', '346827', '346828', '346889', '346973', '346985', '346994', '347006', '347016', '347019', '347024', '348231', '351205', '351209', '351235', '351236', '351237', '351238', '351240', '351261', '370325', '370326', '370328', '370334']
    nosy_count = 11.0
    nosy_names = ['gregory.p.smith', 'paul.moore', 'vstinner', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'sylvain.corlay', 'efiop', 'emanuel', 'Chip Lynch']
    pr_nums = ['14327', '14360', '15706', '15707']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue37380'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @efiop
    Copy link
    Mannequin Author

    efiop mannequin commented Jun 23, 2019

    subprocess keeps the list of active processes in its _active list. Whenever you try to Popen a new process, it is immediately (like right in the first line of Popen.init [1]) trying to go clean up the old ones. If one of the processes in that list is gone, then

    (_WaitForSingleObject(self._handle, 0)
    

    [2] will get OSError: [WinError 6] The handle is invalid and will prevent you from creating any new processes with Popen after that, because the line where that handle is removes from _active list is not reached. On *nix [3] _internal_poll will actually try-except waitpid and will catch OSError without re-raising it, so if the process is gone, it will simply report it as dead and successfully remove it from _active list. It seems like we should do the same thing on Windows and if we catch The handle is invalid error, just report that process as dead and remove it from _active list.

    [1] https://github.com/python/cpython/blob/v3.8.0b1/Lib/subprocess.py#L715

    [2] https://github.com/python/cpython/blob/v3.8.0b1/Lib/subprocess.py#L1310

    [3] https://github.com/python/cpython/blob/v3.8.0b1/Lib/subprocess.py#L1710

    @efiop efiop mannequin added 3.7 (EOL) end of life OS-windows type-crash A hard crash of the interpreter, possibly with a core dump labels Jun 23, 2019
    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 23, 2019

    See bpo-36067 for a related discussion. The _active list and _cleanup function are not required in Windows. When a process exits, it gets rundown to free its handle table and virtual memory. Only the kernel object remains, which is kept alive by pointer and handle references to it that can query information such as the exit status code. As soon as the last reference is closed, the Process object is automatically reaped. It doesn't have to be waited on.

    @efiop
    Copy link
    Mannequin Author

    efiop mannequin commented Jun 23, 2019

    Hi Eryk!

    Thanks for a swift reply! So the current plan for fixing it is:

    1. disable _cleanup, _active, and remove _internal_poll() for windows
    2. ignore EBADF(OSError: [WinError 6] The handle is invalid) in terminate() and probably some other methods

    Please let me know if I'm missing something. Also, is anyone working on this, or should I prepare a PR?

    Thanks,
    Ruslan

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 23, 2019

    1. disable _cleanup, _active, and remove _internal_poll() for windows

    _active only gets appended to in __del__. We can skip the entire body of __del__. Also, calling _cleanup can be skipped in __init__.

    _internal_poll is required for poll().

    1. ignore EBADF(OSError: [WinError 6] The handle is invalid) in terminate() and probably some other methods

    ERROR_INVALID_HANDLE should not be ignored.

    terminate() is doing the right thing by not masking ERROR_INVALID_HANDLE. The only concern is the case where our handle has been closed elsewhere (not by subprocess) and the handle value was subsequently reused as a handle for another process to which we have terminate access. This is a bad handle at a logical, application level, but it's valid and accessible for TerminateProcess. It was suggested to initially get the pid and process startup time in order to validate the call, but Alexey Izbyshev is right that it's not worth complicating the code in subprocess to address an uncommon bug in particular application or library code.

    @vstinner
    Copy link
    Member

    1. ignore EBADF(OSError: [WinError 6] The handle is invalid) in terminate() and probably some other methods

    That sounds like a bad idea. We should avoid using a handle if it can become invalid out of our control. A handle might be recycled, no?

    @efiop
    Copy link
    Mannequin Author

    efiop mannequin commented Jun 25, 2019

    Thanks for pointing that out! I've submitted a tiny patch to skip _collect() and __del__. Please find it in the Github PR attached. Looking forward to your feedback!

    @vstinner
    Copy link
    Member

    See bpo-36067 for a related discussion. The _active list and _cleanup function are not required in Windows. When a process exits, it gets rundown to free its handle table and virtual memory. Only the kernel object remains, which is kept alive by pointer and handle references to it that can query information such as the exit status code. As soon as the last reference is closed, the Process object is automatically reaped. It doesn't have to be waited on.

    Linux (for example) has the same design: the kernel doesn't keep a "full process" alive, but a lightweight structure just for its parent process which gets the exit status. That's the concept of "zombie process".

    One issue on Linux is that the zombie process keeps the pid used until the parent reads the child exit status, and Linux pids are limited to 32768 by default.

    Now I'm not sure that I understand what it means on Windows. The subprocess module uses a magic Handle object which calls CloseHandle(handle) in its __del__() method. I dislike relying on destructors. If an object is kept alive by a reference cycle, it's never released: CloseHandle() isn't called.

    So code spawning process should wait until subprocesses complete to ensure that CloseHandle() is called, no?

    Except that Popen._internal_poll() doesn't clear the handle even after the process completes.

    If Popen.__del__() doesn't add active processes to the _active list, it should at least explicitly call CloseHandle(), no?

    @zooba
    Copy link
    Member

    zooba commented Jun 26, 2019

    The handle can deliberately live beyond the process, but it does not have to. If it is closed early then the kernel object will be freed when no handles remain, which will be at process exit.

    So it's a classic __exit__/del case, where both are needed if you want deterministic resource disposal. But it is in no way tied to the life of the child process, so waiting first is optional.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 26, 2019

    One issue on Linux is that the zombie process keeps the pid used until
    the parent reads the child exit status, and Linux pids are limited to
    32768 by default.

    Windows allocates Process and Thread IDs out of a kernel handle table, which can grow to about 2**24 entries (more than 16 million). So the practical resource limit for inactive Process and Thread objects is available memory, not running out of PID/TID values.

    Linux (for example) has the same design: the kernel doesn't keep a
    "full process" alive, but a lightweight structure just for its parent
    process which gets the exit status. That's the concept of "zombie
    process".

    In Unix, the zombie remains visible in the task list (marked as <defunct> in Linux), but in Windows an exited process is removed from the Process Manager's active list, so it's no longer visible to users. Also, a Process object is reaped as soon as the last reference to it is closed, since clearly no one needs it anymore.

    The subprocess module uses a magic Handle object which calls
    CloseHandle(handle) in its __del__() method. I dislike relying on
    destructors. If an object is kept alive by a reference cycle, it's
    never released: CloseHandle() isn't called.

    We could call self._handle.Close() in _wait(), right after calling GetExitCodeProcess(self._handle). With this change, __exit__ will ensure that _handle gets closed in a deterministic context. Code that needs the handle indefinitely can call _handle.Detach() before exiting the with-statement context, but that should rarely be necessary.

    I don't understand emitting a resource warning in Popen.__del__ if a process hasn't been waited on until completion beforehand (i.e. self.returncode is None). If a script wants to be strict about this, it can use a with statement, which is documented to wait on the process.

    I do understand emitting a resource warning in Popen.__del__ if self._internal_poll() is None. In this case, in Unix only now, the process gets added to the _active list to try to avoid leaking a zombie. The list gets polled in subprocess._cleanup, which is called in Popen.__init__. Shouldn't _cleanup also be set as an atexit function?

    There should be a way to indicate a Popen instance is intended to continue running detached from our process, so scripts don't have to ignore an irrelevant resource warning.

    @vstinner
    Copy link
    Member

    In Unix, the zombie remains visible in the task list (marked as <defunct> in Linux), but in Windows an exited process is removed from the Process Manager's active list, so it's no longer visible to users. Also, a Process object is reaped as soon as the last reference to it is closed, since clearly no one needs it anymore.
    (...)
    We could call self._handle.Close() in _wait(), right after calling GetExitCodeProcess(self._handle). With this change, __exit__ will ensure that _handle gets closed in a deterministic context.

    I created bpo-37410 and PR 14391 to close the handle when the process completes. IMHO this change is easy and safe, it can be backported to Python 3.7 and 3.8.

    Code that needs the handle indefinitely can call _handle.Detach() before exiting the with-statement context, but that should rarely be necessary.
    (...)
    There should be a way to indicate a Popen instance is intended to continue running detached from our process, so scripts don't have to ignore an irrelevant resource warning.

    My point is that if you want to "detach" a process, it must be *explicit*. In my experience, processes are "leaked" implicitly: by mistake. In multiprocessing, it can be subtle, you don't manipulate subprocess.Popen directly.

    Users must not access private attributes (Popen._handle). After I added the ResourceWarning, I created bpo-27068 "Add a detach() method to subprocess.Popen". But I only created to reply to a request of Martin Panter, I didn't use this feature myself. At some point, I decided to just close the issue. Maybe it's time to reopen it.

    I don't understand emitting a resource warning in Popen.__del__ if a process hasn't been waited on until completion beforehand (i.e. self.returncode is None). If a script wants to be strict about this, it can use a with statement, which is documented to wait on the process.

    The purpose is to help developers to find bugs in their bugs. Many are now aware that programs continue to run in the background and that leaking zombie processes is an issue on Unix.

    The process is not polled to be able to emit the warning in more cases, again, to ease debug.

    The warning is ignored by default.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jun 26, 2019

    The process is not polled to be able to emit the warning in more
    cases, again, to ease debug.

    It emits an incorrect warning if the process has already exited: "subprocess %s is still running". This can be rectified. Here's my current understanding:

        def __del__(self, _maxsize=sys.maxsize, _warn=warnings.warn):
            if not self._child_created or self.returncode is not None:
                return
            # In Unix, not reading the subprocess exit status creates a zombie
            # process, which is only destroyed at the parent Python process exit.
            # In Windows, no one else should have a reference to our _handle, so
            # it should get finalized and thus closed, but we use the same warning
            # in order to consistently educate developers.
            if self._internal_poll(_deadstate=_maxsize) is not None:
                _warn("subprocess %s was implicitly finalized" % self.pid,
                      ResourceWarning, source=self)
            else:
                _warn("subprocess %s is still running" % self.pid,
                      ResourceWarning, source=self)
                # Keep this instance alive until we can wait on it, if needed.
                if _active is not None:
                    _active.append(self)

    @vstinner
    Copy link
    Member

    Sorry, I'm lost :-( More and more different issues are discussed here. Would it be possible please to stick this issue to _active list and _cleanup() on Windows? From what I read, remove/disable _active and _cleanup() makes sense on Windows.

    I already created bpo-37410 to call CloseHandle() earlier.

    Eryk: Please open a separated issue if you want to change the ResourceWarning message.

    @vstinner
    Copy link
    Member

    Here is a concrete example of ResourceWarning when debugging/stressing multiprocessing code, I interrupt (CTRL+c) test_resource_tracker while it's running:

    vstinner@apu$ ./python -m test test_multiprocessing_spawn --fail-env-changed -v -m test_resource_tracker
    == CPython 3.9.0a0 (heads/master:689830ee62, Jun 26 2019, 23:07:31) [GCC 9.1.1 20190503 (Red Hat 9.1.1-1)]
    == Linux-5.1.11-300.fc30.x86_64-x86_64-with-glibc2.29 little-endian
    == cwd: /home/vstinner/prog/python/master/build/test_python_18573
    == CPU count: 8
    == encodings: locale=UTF-8, FS=utf-8
    Run tests sequentially
    0:00:00 load avg: 0.32 [1/1] test_multiprocessing_spawn
    test_resource_tracker (test.test_multiprocessing_spawn.TestResourceTracker) ...

    ^C

    /home/vstinner/prog/python/master/Lib/subprocess.py:917: ResourceWarning: subprocess 18576 is still running
    _warn("subprocess %s is still running" % self.pid,
    ResourceWarning: Enable tracemalloc to get the object allocation traceback
    /home/vstinner/prog/python/master/Lib/test/libregrtest/runtest.py:283: ResourceWarning: unclosed file <_io.BufferedReader name=6>
    return INTERRUPTED
    ResourceWarning: Enable tracemalloc to get the object allocation traceback

    == Tests result: INTERRUPTED ==
    Test suite interrupted by signal SIGINT.

    1 test omitted:
    test_multiprocessing_spawn

    Total duration: 304 ms
    Tests result: INTERRUPTED

    @vstinner
    Copy link
    Member

    New changeset 042821a by Victor Stinner (Ruslan Kuprieiev) in branch 'master':
    bpo-37380: subprocess: don't use _active on win (GH-14360)
    042821a

    @vstinner
    Copy link
    Member

    I merged PR 14360 into master. Are Python 2.7, 3.7 and 3.8 impacted by the bug as well? Should we backport the change, or would it be too dangerous?

    @efiop
    Copy link
    Mannequin Author

    efiop mannequin commented Jun 29, 2019

    The _active code is pretty old and has not changed since 2.7, so it makes sense to backport it for now(also, should be very trivial to do that). To me, it doesn't seem dangerous to backport that, but I would love to hear what Eryk and others think :)

    @vstinner
    Copy link
    Member

    vstinner commented Jul 1, 2019

    To me, it doesn't seem dangerous to backport that, but I would love to hear what Eryk and others think :)

    In my experience, any change is dangerous :-)

    The initial bug doesn't contain any reproducer. I'm talking about:

    If one of the processes in that list is gone, then (...) "The handle is invalid" (...) will prevent you from creating any new processes with Popen after that

    It's the first time that I see such bug report. It seems like subprocess works for most people :-) Can you elobrate how to reproduce this issue to justify a backport?

    @eryksun
    Copy link
    Contributor

    eryksun commented Jul 1, 2019

    > If one of the processes in that list is gone, then (...) "The handle
    > is invalid" (...) will prevent you from creating any new processes
    > with Popen after that

    It's the first time that I see such bug report.

    IIRC it's the second time I've seen that issue reported. It should be rare. It should only occur if some other code in our process or another process closes our _handle value. It's not our bug. Here's an example in 3.7:

        >>> p = subprocess.Popen('python -c "input()"', stdin=subprocess.PIPE)
        >>> del p
        >>> subprocess._active
        [<subprocess.Popen object at 0x0000015E5C94E160>]
        >>> subprocess._active[0]._handle.Close()

    The process is active, but something closed our handle. The next time Popen.__init__ calls _cleanup, _internal_poll raises OSError due to the invalid handle:

        >>> subprocess.call('python -V')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "C:\Program Files\Python37\lib\subprocess.py", line 323, in call
            with Popen(*popenargs, **kwargs) as p:
          File "C:\Program Files\Python37\lib\subprocess.py", line 664, in __init__
            _cleanup()
          File "C:\Program Files\Python37\lib\subprocess.py", line 228, in _cleanup
            res = inst._internal_poll(_deadstate=sys.maxsize)
          File "C:\Program Files\Python37\lib\subprocess.py", line 1216, in _internal_poll
            if _WaitForSingleObject(self._handle, 0) == _WAIT_OBJECT_0:
        OSError: [WinError 6] The handle is invalid

    I wouldn't want _internal_poll to silence this error, but maybe it could be translated into a warning. That said, since there's no need to wait on a process in Windows, there's no need for __del__ to insert a running process in a list of active process, in which case there's nothing for _cleanup to do. Given we're in __del__, our private _handle is about to be closed, at least it will be in CPython. Other implementations should wait() or use a with statement, in order to explicitly close the process handle. The latter isn't implemented currently, but you opened bpo-37410 to address this.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 1, 2019

    subprocess._active[0]._handle.Close()

    Why would you do that? You should not access the private _active list, nor access the private _handle attribute. I understand that it's a way to trigger such bug, but is it possible to trigger this bug without accessing any private attribute?

    I wouldn't want _internal_poll to silence this error, but maybe it could be translated into a warning

    I disagree with that. It's very bad is suddenly the handle becomes invalid for no obvious reason. It's better to get an hard error (exception) in such case.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jul 1, 2019

    > subprocess._active[0]._handle.Close()

    Why would you do that? You should not access the private _active list,
    nor access the private _handle attribute. I understand that it's a way
    to trigger such bug, but is it possible to trigger this bug without
    accessing any private attribute?

    The example is just emulating a problem from someone else's code that's closing our handle. Typically this situation occurs because the code is holding onto a handle value for a kernel object (File, Section, Job, Process, Thread, Event, etc) that got closed. The handle value eventually gets reused, such as for our _handle.

    > I wouldn't want _internal_poll to silence this error, but maybe it
    > could be translated into a warning

    I disagree with that. It's very bad is suddenly the handle becomes
    invalid for no obvious reason. It's better to get an hard error
    (exception) in such case.

    I agree, but the exception breaks Popen.__init__. It would have to be ignored or translated to a warning somewhere if we continues to poll a list of active processes every time __init__() is called. But since the latter is unnecessary in Windows, I suggested to just skip it.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 1, 2019

    The example is just emulating a problem from someone else's code that's closing our handle. Typically this situation occurs because the code is holding onto a handle value for a kernel object (File, Section, Job, Process, Thread, Event, etc) that got closed.

    Without accessing private attributes, I don't see how someone can discover the private handle. So for me, it's more a serious bug in an application, no? Blindly closing random handles doesn't sound like a good idea to me.

    The handle value eventually gets reused, such as for our _handle.

    That's a side effect on the blindly closing random handles.

    @eryksun
    Copy link
    Contributor

    eryksun commented Jul 1, 2019

    Without accessing private attributes, I don't see how someone can
    discover the private handle. So for me, it's more a serious bug in an
    application, no? Blindly closing random handles doesn't sound like a
    good idea to me.

    Say a library calls CreateEventW and gets handle 32. It passes this handle to some other library, which uses the event and closes the handle when it no longer needs it. But due to a miscommunication in the documentation, the first library thinks the handle remains open. Now handle 32 is free for reuse, but the library doesn't know this. subprocess.Popen subsequently calls CreateProcessW and gets handle 32. Later on, the library closes handle 32, making it invalid, at least until it gets assigned to some other kernel object.

    @vstinner
    Copy link
    Member

    vstinner commented Jul 1, 2019

    Say a library calls CreateEventW and gets handle 32. It passes this handle to some other library, which uses the event and closes the handle when it no longer needs it. But due to a miscommunication in the documentation, the first library thinks the handle remains open. Now handle 32 is free for reuse, but the library doesn't know this. subprocess.Popen subsequently calls CreateProcessW and gets handle 32. Later on, the library closes handle 32, making it invalid, at least until it gets assigned to some other kernel object.

    So yeah, it's a severe bug in an application. An application should not close the wrong handle by mistake :-)

    Thanks for the explanation. So it would be "nice" to backport the change to reduce the impact of such application bug, but it's not really a bug in Python itself. Python cannot fully protect developers for such class of bugs.

    On Unix, there is a similar bug with applications trying to close a file descriptor which is already closed. It can lead to very severe bugs (crashes):
    https://bugs.python.org/issue18748

    @emanuel
    Copy link
    Mannequin

    emanuel mannequin commented Jul 21, 2019

    I believe this to be a bug in the standard library instead of solely being the result of an application error.

    When a Popen instance is finalized by the garbage collector, the internal handle is also finalized and closed despite the instance being put on the active list. This results in _cleanup throwing because the handle can no longer be used.

    I've attached a small reproduction.

    @ChipLynch
    Copy link
    Mannequin

    ChipLynch mannequin commented Sep 5, 2019

    I see there is a merged patch for this (#14360); is it possible to get it tagged for backport to 3.7? I haven't seen the 3.7.5 tag drop yet so wanted to recommend it's included, it's giving us fits. On our Windows 10 dev boxes and Win 2016 servers, emanuel's repro fails 3.5.4, 3.6.8, 3.7.4, and 3.8.0b4, but not apparently 2.7 (I think we tested 2.7.12). Per vstinner's request, I agree and would like to see it get a 3.7 and 3.8 backport.

    Many thanks!

    @zooba
    Copy link
    Member

    zooba commented Sep 5, 2019

    I triggered automatic backports to see how they go. Doesn't necessarily mean we'll merge them without looking more closely, but may as well check out what our automated tools suggest.

    @zooba zooba added the 3.8 only security fixes label Sep 5, 2019
    @zooba zooba added the 3.9 only security fixes label Sep 5, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2019

    When a Popen instance is finalized by the garbage collector, the internal handle is also finalized and closed despite the instance being put on the active list. This results in _cleanup throwing because the handle can no longer be used.

    Right, that's a more practical case where this bug still occurs on Python 3.8 and older. I would suggest to fix applications which should get a ResourceWarning warning in this case.

    Chip Lynch wrote me an email saying that his team is impacted by the bug on Windows with Python 3.7.

    I tried to write a simpler patch ignoring ERROR_INVALID_HANDLE, but then I read again this issue discussion: Eryk Sun and me agree that ignorning ERROR_INVALID_HANDLE is a bad idea, it can be worse: silence a real bug.

    So I now agree to backport the fix from master to 3.7 and 3.8 branches.

    I prefer to leave 2.7 unchanged even if it's affected. I don't want to take the risk of introducing another regression in 2.7. Moreover, subprocess.Popen has a __del__() method. Python 2.7 handles reference cycles differently than Python 3: it doesn't implement finalizers (PEP-442). While Python 2.7 might be affected, it should be affected differently.

    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2019

    New changeset 1e2707d by Victor Stinner (Miss Islington (bot)) in branch '3.7':
    bpo-37380: subprocess: don't use _active on win (GH-14360) (GH-15706)
    1e2707d

    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2019

    New changeset 4d1abed by Victor Stinner (Miss Islington (bot)) in branch '3.8':
    bpo-37380: subprocess: don't use _active on win (GH-14360) (GH-15707)
    4d1abed

    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2019

    Thanks Ruslan Kuprieiev for the bug report and the fix. Thanks Eryk Sun (as usual!) for the great analysis of this issue.

    Ok, the bug should now be fixed in 3.7, 3.8 and master branches.

    I chose to not fix Python 2.7: see my previous comment.

    See bpo-37410 for the follow-up: "[Windows] subprocess: close the handle when the process completes".

    @vstinner vstinner closed this as completed Sep 6, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Sep 6, 2019

    Steve Dower: "I triggered automatic backports to see how they go. Doesn't necessarily mean we'll merge them without looking more closely, but may as well check out what our automated tools suggest."

    Thanks, I merged the backports.

    @eryksun
    Copy link
    Contributor

    eryksun commented Sep 6, 2019

    When a Popen instance is finalized by the garbage collector, the
    internal handle is also finalized and closed despite the instance
    being put on the active list. This results in _cleanup throwing
    because the handle can no longer be used.

    Thanks. I hadn't considered that. It can lead to unusual behavior like this when __del__ keeps an object alive. The object can't assume that referenced objects haven't already been finalized. All of the subsequently required instance methods should take this into account. For example, Popen._internal_poll could set the return code to -1 if self._handle.closed is true. That's at a higher level, and much more reasonable, than trying to handle ERROR_INVALID_HANDLE, which could mask bugs.

    @sylvaincorlay
    Copy link
    Mannequin

    sylvaincorlay mannequin commented May 29, 2020

    Hello,

    Is there a means to work around that bug for Python 3.6 ? It seems that the fix was only backported to 3.7, and we were not planning on dropping Python 3.6 support quite yet in our project.

    @vstinner
    Copy link
    Member

    Sadly, 3.6 no longer get bug fixes, only security fixes:
    https://devguide.python.org/#status-of-python-branches

    @sylvaincorlay
    Copy link
    Mannequin

    sylvaincorlay mannequin commented May 29, 2020

    Yes, I understand that.

    I was wondering if there was a workaround that we could use to not drop 3.6
    support right away!

    On Fri, May 29, 2020, 19:29 STINNER Victor <report@bugs.python.org> wrote:

    STINNER Victor <vstinner@python.org> added the comment:

    Sadly, 3.6 no longer get bug fixes, only security fixes:
    https://devguide.python.org/#status-of-python-branches

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue37380\>


    @zooba
    Copy link
    Member

    zooba commented May 29, 2020

    import subprocess
    subprocess._cleanup = lambda: None

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes OS-windows type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants