Issue 27448: Race condition in subprocess.Popen which causes a huge memory leak

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71635

classification

Title:	Race condition in subprocess.Popen which causes a huge memory leak
Type:		Stage:	resolved
Components:		Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	gregory.p.smith	Nosy List:	aonishuk, gps, gregory.p.smith, gwalker, ppperry, r.david.murray
Priority:	normal	Keywords:	patch

Created on 2016-07-04 12:52 by aonishuk, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue27448-gps01.patch	gregory.p.smith, 2016-07-05 20:02		review

Pull Requests
URL	Status	Linked	Edit
PR 1932	merged	gregory.p.smith, 2017-06-03 18:12

Messages (10)
msg269783 - (view)	Author: Andrew (aonishuk)	Date: 2016-07-04 12:52
We had problem where at some point python start consuming RAM. Until it ends. The reason was race condition in subprocess.Popen. Which triggered gc.disable() and never gc.enable(). The workaround we use is: subprocess.gc.isenabled = lambda: True The scenario for race condition is pretty obvious looking into the code below: gc_was_enabled = gc.isenabled() <- T1 gets false here gc.disable() try: self.pid = os.fork() <- while T2 is here except: if gc_was_enabled: gc.enable() raise ... CODE FRAGMENT 1 ... if gc_was_enabled: gc.enable() Also I wonder if exception fails in "... CODE FRAGMENT 1 ..." why don't we re-enable gc (finally block)
msg269796 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-07-04 16:43
I presume we don't re-enable gc in the child because if there's an exception it is exiting anyway. FYI subprocess32, which fixes some other race conditions, shouldn't have this problem, since it is a backport of subprocess from python3, which no longer manipulates the GC. Unlike the other race that subprocess32 fixes, this one could be fixed in python2, though, if you want to propose a patch.
msg269802 - (view)	Author: (ppperry)	Date: 2016-07-04 17:44
What are you doing that creates so many circular references that the not collecting them causes a massive memory leak? Or are you using an alternative implementation of Python? In addition, your monkeypatching is incorrect. `subprocess.gc` is the same object as the module you get by typing `import gc`, sou you are completely breaking the `gc.isenabled` functionality.
msg269804 - (view)	Author: Andrew (aonishuk)	Date: 2016-07-04 19:43
pppery, I don't think I am breaking gc functionality with my code. The code line I gave just meant to give the basic idea of what was helping to workaround this. If you are actually interested in the code I use, it is below: def fix_subprocess_racecondition(): """ !!! PLEASE NOTE THIS SHOULD BE CALLED BEFORE ANY OTHER INITIALIZATION was done to avoid already created links to subprocess or subprocess.gc or gc """ # monkey patching subprocess import subprocess subprocess.gc.isenabled = lambda: True # re-importing gc to have correct isenabled for non-subprocess contexts import sys del sys.modules['gc'] import gc
msg269847 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-07-05 20:02
FWIW at this point i'm not willing to "fix" issues in Python 2.7's subprocess module for POSIX platforms as subprocess32 is a superior drop in alternative without the issues. inspecting the 2.7 code, if you are seeing a memory leak and the statements that fail before the gc.enable call are more than "self._child_created = True" and "if self.pid == 0:", it must be the child process. there DOES appear to be a unlikely but potential bug in the 2.7 child process code: its "os._exit(255)" call should really be within a finally: so that any exception raised during the bare "except:" phase somehow still exits. that should be rare. the result if missing that would be a runaway forked child process that does indeed have gc disabled but is in the process of propagating an exception upwards (and is likely to hit others along the way in the other finally: clauses) so it is would die anyways unless the calling code catches and dismisses that (possible). so what I _imagine_ you _might_ be seeing somehow is the pre-exec forked child exception handling+reporting code failing and leading to a runaway child who's exception you caught and let keep running. if you have this situation, something else odd is going on (is the os.write() failing?). i've attached a _possible_ patch. but i'd rather not commit this without a test that reproduces the situation (that sounds difficult). I strongly recommend you just use subprocess32.
msg269885 - (view)	Author: Andrew (aonishuk)	Date: 2016-07-06 13:56
Gregory P. Smith, "if you are seeing a memory leak and the statements that fail before the gc.enable call are more than" The problem in my case was different. It was race condition between gc.isenabled and gc.disable. Isenabled was called when gc was disabled by another subprocess. This caused that later on gc.enable was not triggered.
msg269915 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-07-07 00:33
ah, okay, that makes much more sense and is indeed a bug. Recommendation given that most people won't be using a python 2.7 version that has this and tons of other subprocess from a threaded process bugs fixed: Use subprocess32 where this isn't a problem.
msg301351 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2017-09-05 18:20
New changeset 5e8e371364ee58dadb9a4e4e51c7e9cf6bedbfae by Gregory P. Smith in branch '2.7': bpo-27448: Work around a gc.disable race condition in subprocess. (#1932) https://github.com/python/cpython/commit/5e8e371364ee58dadb9a4e4e51c7e9cf6bedbfae
msg301352 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2017-09-05 18:21
the recommendation to use subprocess32 still stands, but this bug has been "fixed" in as much as it can be without a complete rewrite within 2.7.
msg316107 - (view)	Author: Glen Walker (gwalker)	Date: 2018-05-03 03:59
Correct me if I'm wrong, the change released in Python 2.7.15 doesn't actually fix this race condition. The race is: * T1: gc_was_enabled = gc.isenabled() # True * T1: gc.disable() * T2: gc_was_enabled = gc.isenabled() # False * T1: gc.enable() * T2: gc.disable() To fix the race condition _disabling_gc_lock must also be held for gc.enable() in both call locations so it cannot happen between gc.isenabled() and gc.disable(). TBH, I'm not really expecting this to be fixed, this is more a note for any future travelers who come across this issue.

History
Date	User	Action	Args
2022-04-11 14:58:33	admin	set	github: 71635
2018-05-03 03:59:16	gwalker	set	nosy: + gwalker messages: + msg316107
2017-09-05 18:21:44	gregory.p.smith	set	status: open -> closed resolution: fixed messages: + msg301352 stage: resolved
2017-09-05 18:20:05	gregory.p.smith	set	messages: + msg301351
2017-06-03 18:12:08	gregory.p.smith	set	pull_requests: + pull_request2009
2016-07-07 00:33:24	gregory.p.smith	set	assignee: gregory.p.smith messages: + msg269915
2016-07-06 13:56:19	aonishuk	set	messages: + msg269885
2016-07-05 20:02:38	gregory.p.smith	set	files: + issue27448-gps01.patch nosy: + gregory.p.smith messages: + msg269847 keywords: + patch
2016-07-04 19:43:23	aonishuk	set	messages: + msg269804
2016-07-04 17:44:22	ppperry	set	nosy: + ppperry messages: + msg269802
2016-07-04 16:43:21	r.david.murray	set	nosy: + r.david.murray, gps messages: + msg269796
2016-07-04 12:52:32	aonishuk	create