This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Race condition in subprocess.Popen which causes a huge memory leak
Type: Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: aonishuk, gps, gregory.p.smith, gwalker, ppperry, r.david.murray
Priority: normal Keywords: patch

Created on 2016-07-04 12:52 by aonishuk, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue27448-gps01.patch gregory.p.smith, 2016-07-05 20:02 review
Pull Requests
URL Status Linked Edit
PR 1932 merged gregory.p.smith, 2017-06-03 18:12
Messages (10)
msg269783 - (view) Author: Andrew (aonishuk) Date: 2016-07-04 12:52
We had problem where at some point python start consuming RAM. Until it ends.

The reason was race condition in subprocess.Popen. Which triggered gc.disable() and never gc.enable().

The workaround we use is:
subprocess.gc.isenabled = lambda: True


The scenario for race condition is pretty obvious looking into the code below:

          gc_was_enabled = gc.isenabled() <- T1 gets false here
          gc.disable()
          try:
              self.pid = os.fork() <- while T2 is here
          except:
              if gc_was_enabled:
                  gc.enable()
              raise
          ... CODE FRAGMENT 1 ...
          if gc_was_enabled:
              gc.enable()

Also I wonder if exception fails in "... CODE FRAGMENT 1 ..." why don't we re-enable gc (finally block)
msg269796 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-04 16:43
I presume we don't re-enable gc in the child because if there's an exception it is exiting anyway.

FYI subprocess32, which fixes some other race conditions, shouldn't have this problem, since it is a backport of subprocess from python3, which no longer manipulates the GC.  Unlike the other race that subprocess32 fixes, this one could be fixed in python2, though, if you want to propose a patch.
msg269802 - (view) Author: (ppperry) Date: 2016-07-04 17:44
What are you doing that creates so many circular references that the not collecting them causes a massive memory leak? Or are you using an alternative implementation of Python?

In addition, your monkeypatching is incorrect. `subprocess.gc` is the same object as the module you get by typing `import gc`, sou you are completely breaking the `gc.isenabled` functionality.
msg269804 - (view) Author: Andrew (aonishuk) Date: 2016-07-04 19:43
pppery, I don't think I am breaking gc functionality with my code. The code line I gave just meant to give the basic idea of what was helping to workaround this. If you are actually interested in the code I use, it is below:

def fix_subprocess_racecondition():
  """
  !!! PLEASE NOTE THIS SHOULD BE CALLED BEFORE ANY OTHER INITIALIZATION was done to avoid already created links to subprocess or subprocess.gc or gc
  """
  # monkey patching subprocess
  import subprocess
  subprocess.gc.isenabled = lambda: True

  # re-importing gc to have correct isenabled for non-subprocess contexts
  import sys
  del sys.modules['gc']
  import gc
msg269847 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2016-07-05 20:02
FWIW at this point i'm not willing to "fix" issues in Python 2.7's subprocess module for POSIX platforms as subprocess32 is a superior drop in alternative without the issues.

inspecting the 2.7 code, if you are seeing a memory leak and the statements that fail before the gc.enable call are more than "self._child_created = True" and "if self.pid == 0:", it must be the child process.

there DOES appear to be a unlikely but potential bug in the 2.7 child process code: its "os._exit(255)" call should really be within a finally: so that any exception raised during the bare "except:" phase somehow still exits.  that should be rare.  the result if missing that would be a runaway forked child process that does indeed have gc disabled but is in the process of propagating an exception upwards (and is likely to hit others along the way in the other finally: clauses) so it is would die anyways unless the calling code catches and dismisses that (possible).

so what I _imagine_ you _might_ be seeing somehow is the pre-exec forked child exception handling+reporting code failing and leading to a runaway child who's exception you caught and let keep running.  if you have this situation, something else odd is going on (is the os.write() failing?).

i've attached a _possible_ patch.  but i'd rather not commit this without a test that reproduces the situation (that sounds difficult).

I strongly recommend you just use subprocess32.
msg269885 - (view) Author: Andrew (aonishuk) Date: 2016-07-06 13:56
Gregory P. Smith, 

"if you are seeing a memory leak and the statements that fail before the gc.enable call are more than"

The problem in my case was different. It was race condition between gc.isenabled and gc.disable. 

Isenabled was called when gc was disabled by another subprocess. This caused that later on gc.enable was not triggered.
msg269915 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2016-07-07 00:33
ah, okay, that makes much more sense and is indeed a bug.

Recommendation given that most people won't be using a python 2.7 version that has this and tons of other subprocess from a threaded process bugs fixed: Use subprocess32 where this isn't a problem.
msg301351 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-09-05 18:20
New changeset 5e8e371364ee58dadb9a4e4e51c7e9cf6bedbfae by Gregory P. Smith in branch '2.7':
bpo-27448: Work around a gc.disable race condition in subprocess. (#1932)
https://github.com/python/cpython/commit/5e8e371364ee58dadb9a4e4e51c7e9cf6bedbfae
msg301352 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-09-05 18:21
the recommendation to use subprocess32 still stands, but this bug has been "fixed" in as much as it can be without a complete rewrite within 2.7.
msg316107 - (view) Author: Glen Walker (gwalker) Date: 2018-05-03 03:59
Correct me if I'm wrong, the change released in Python 2.7.15 doesn't actually fix this race condition.

The race is:
* T1: gc_was_enabled = gc.isenabled() # True 
* T1: gc.disable()
* T2: gc_was_enabled = gc.isenabled() # False
* T1: gc.enable()
* T2: gc.disable()

To fix the race condition _disabling_gc_lock must also be held for gc.enable() in both call locations so it cannot happen between gc.isenabled() and gc.disable().

TBH, I'm not really expecting this to be fixed, this is more a note for any future travelers who come across this issue.
History
Date User Action Args
2022-04-11 14:58:33adminsetgithub: 71635
2018-05-03 03:59:16gwalkersetnosy: + gwalker
messages: + msg316107
2017-09-05 18:21:44gregory.p.smithsetstatus: open -> closed
resolution: fixed
messages: + msg301352

stage: resolved
2017-09-05 18:20:05gregory.p.smithsetmessages: + msg301351
2017-06-03 18:12:08gregory.p.smithsetpull_requests: + pull_request2009
2016-07-07 00:33:24gregory.p.smithsetassignee: gregory.p.smith
messages: + msg269915
2016-07-06 13:56:19aonishuksetmessages: + msg269885
2016-07-05 20:02:38gregory.p.smithsetfiles: + issue27448-gps01.patch

nosy: + gregory.p.smith
messages: + msg269847

keywords: + patch
2016-07-04 19:43:23aonishuksetmessages: + msg269804
2016-07-04 17:44:22ppperrysetnosy: + ppperry
messages: + msg269802
2016-07-04 16:43:21r.david.murraysetnosy: + r.david.murray, gps
messages: + msg269796
2016-07-04 12:52:32aonishukcreate