This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: subprocess Popen deadlock
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Christoph.Mathys, gregory.p.smith, neologix, orsenthil, pitrou
Priority: normal Keywords:

Created on 2010-11-12 14:05 by Christoph.Mathys, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
deadlock.py Christoph.Mathys, 2010-11-12 14:05
Messages (5)
msg121036 - (view) Author: Christoph Mathys (Christoph.Mathys) Date: 2010-11-12 14:05
The ctor of subprocess.Popen has a race condition, which the attached program should demonstrate (on my computer a few seconds are enough). Program One sleeps for 2 seconds, Program Two exits right after execve. Now I would expect Program Two to take a very short time between Popen and the completion of wait(), but it regularly takes about 2 seconds.

The problem is this: Popen._execute_child opens a pipe and sets the FD_CLOEXEC flag. If thread_1 just finished creating the pipe but could not yet set FD_CLOEXEC when thread_2 fork()s, thread_1 will lock up when it reads on the pipe (errpipe_read). The process forked by thread_1 will close the pipe, but the process forked by thread_2 will only close the pipe when it exits, blocking thread_1 inside the read function until then.

I see different options:
Linux has the platform specific flag O_CLOEXEC to set this flag during open() (the manpage of open says since 2.6.23, so highly platform dependent)

To just solve the problem for Popens ctor it is enough to serialize all code from before pipe() until after fork(). This can still lead to problems if fork is called in other contexts than Popens ctor.

A general solution would be to use a socket which can be shutdown().

If close_fds is set for Popens ctor, the problem does not occur because the extra pipe of the forked process will be closed.
msg121273 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-11-16 08:47
Did you attach the correct files? You mention about two programs in the
description, but you have attached only one file 'deadlock.py'.
Also, it does not fail on Python 2.7.1.  Please try it on the latest
codeline from release27-maint too.
msg121310 - (view) Author: Christoph Mathys (Christoph.Mathys) Date: 2010-11-16 19:27
Yes, it's the correct file. Sorry, I'm making quite a mess in my description about program: The "attached program" is deadlock.py. Program One and Two are python scripts executed using "python -c", the code is inside deadlock.py.

I installed python 2.7 (2.7.0+) and 3.1 (3.1.2, had to fix a print statement) and could reproduce the error on both versions. Checking the code in subprocess.py confirmed that the bug is still there. However, I had to increase the number of threads (deadlock.py, line 38) to provoke the error, but I used different hardware and OS release than in the first test ((but still multi core on Linux).

What do you expect on fail? I'm a noob when it comes to python, the script just prints "command took too long: <time>", nothing else...
msg125934 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-01-10 21:48
It's now fixed in py3k, FD_CLOEXEC is set atomically (using pipe2 if available, otherwise it still has the GIL protection). See http://svn.python.org/view?view=rev&revision=87207
msg125937 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-01-10 22:12
Ok, so let's close the issue. The fix can't reasonably be backported since it involves a whole new C extension and a rather large chunk of new code.
History
Date User Action Args
2022-04-11 14:57:08adminsetgithub: 54603
2011-01-10 22:12:11pitrousetstatus: open -> closed
versions: - Python 2.6
nosy: + gregory.p.smith

messages: + msg125937

resolution: out of date
2011-01-10 21:48:30neologixsetnosy: + pitrou, neologix
messages: + msg125934
2010-11-16 19:27:22Christoph.Mathyssetmessages: + msg121310
versions: + Python 3.1, Python 2.7
2010-11-16 08:47:24orsenthilsetnosy: + orsenthil
messages: + msg121273
2010-11-12 14:05:02Christoph.Mathyscreate