classification
Title: Potential memory leak in multiprocessing
Type: resource usage Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: jnoller Nosy List: asksol, jnoller, pitrou, sbt, schlesin
Priority: normal Keywords:

Created on 2009-08-05 20:23 by jnoller, last changed 2014-10-04 23:46 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
test.zip jnoller, 2009-08-05 20:23
Messages (7)
msg91332 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-05 20:23
I have example code to show this. It creates a system-wide memory leak 
on Linux/Unix (present until the next reboot), unless the last statement 
in the target of mp.Process ensures a manual clean up of the globals.

The problem is line 353 in multiprocessing/forking.py. The function 
exit() is defined as os._exit on Linux and ExitProcess on Windows, none 
of which allows normal clean up.

>>> help(os._exit)
Help on built-in function _exit in module nt:

_exit(...)
   _exit(status)

   Exit to the system with specified status, without normal exit 
processing.


The problem is fixed if line 353 in forking.py is changed from

exit(exitcode)

to

sys.exit(exitcode)




Test run without bugfix:

G:\DEVELO~1\SHARED~2>python test.py
open handle to 569f439b24e24fc8a547b81932616066
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
open handle to 0582d4b161c546f582c1c96e7bd0c39d
open handle to 569f439b24e24fc8a547b81932616066
modified array
closed handle to 569f439b24e24fc8a547b81932616066
[[ 1.  1.  1.  0.]
 [ 1.  1.  1.  0.]]
closed handle to 569f439b24e24fc8a547b81932616066


You can see here that opening and closing of handles are unmatched. This 
is on Windows, where the kernel ensures the clean-up, so it may not 
matter. But on Unix this would have created a permament (system wide) 
memory leak! What is happening here is globals not being cleaned up due 
to the use of os._exit instead of sys.exit.


Test run with bugfix:

G:\DEVELO~1\SHARED~2>python test.py
open handle to 930778d27b414253bc329f2b70adaa05
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
open handle to 3f6cebf8c5de413685bb770d02ae9666
open handle to 930778d27b414253bc329f2b70adaa05
modified array
closed handle to 930778d27b414253bc329f2b70adaa05
closed handle to 3f6cebf8c5de413685bb770d02ae9666
[[ 1.  1.  1.  0.]
 [ 1.  1.  1.  0.]]
closed handle to 930778d27b414253bc329f2b70adaa05



Now all allocations and deallocations are matched.


Regards,
Sturla Molden
msg91333 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-05 20:25
Additional comments from Sturla:

Hello Jesse,

Yes there is a bug in multiprocessing.

Diagnosis:

- Processes created by multiprocessing (mp.Process or mp.Pool) exit in a 
way that prevents the Python interpreter from running deallocation code 
for all extension objects (only the locals are cleaned up). Resources 
allocated by extension objects referenced in globals may leak 
permanently.

Sub-processes seem to commit an ungraceful suicide on exit. If the 
kernel cleans up after a non-graceful exit this is ok. But if the kernel 
do not, as in the case of System V IPC objects, we have a permanent 
resource leak. This is very similar to the reason why manually killing 
threads is prohibited in Python.

I have example code to show this. It creates a system-wide memory leak 
on Linux/Unix (present until the next reboot), unless the last statement 
in the target of mp.Process ensures a manual clean up of the globals.
msg91334 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-05 20:39
> Calling os.exit in a child process may be dangerous. It can cause 
> unflushed buffers to be flushed twice: once in the parent and once in 
> the child. 

I assume you mean sys.exit. If this is the case, multiprocessing needs 
a mechanism to chose between os._exit and sys.exit for child 
processes. Calling os._exit might also be dangerous because it could 
prevent necessary clean-up code from executing  (e.g. in C 
extensions). I had a case where shared memory on Linux (System V IPC) 
leaked due to os._exit. The deallocator for my extension type never 
got to execute in child processes. The deallocator was needed to 
release the shared segment when its reference count dropped to 0. 
Changing to sys.exit solved the problem. On Windows there was no leak, 
because the kernel did the reference counting.
msg91335 - (view) Author: Jesse Noller (jnoller) * (Python committer) Date: 2009-08-05 20:48
> In the future please use the bug tracker to file and track bugs with, 
> so things are not as lossy. 


Ok, sorry :) 
Also see Piet's comment here. He has a valid case against sys.exit in 
some cases. Thus it appears that both ways of shutting down child 
processes might be dangerous: If we don't want buffers to flush we 
have to use os._exit. If we want clean-up code to execute we have to 
use sys.exit. If we want both we are screwed. :(
msg185597 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-03-30 22:49
Richard, do you think this is an actual concern?
msg185602 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2013-03-31 00:12
I don't think this is a bug -- processes started with fork() should nearly always be exited with _exit().  And anyway, using sys.exit() does *not* guarantee that all deallocators will be called.  To be sure of cleanup at exit you could use (the undocumented) multiprocessing.util.Finalize().

Note that Python 3.4 on Unix will probably offer the choice of using os.fork()/os._exit() or _posixsubprocess.fork_exec()/sys.exit() for starting/exiting processes on Unix.

Sturla's scheme for doing reference counting of shared memory is also flawed because reference counts can fall to zero while a shared memory object is in a pipe/queue, causing the memory to be prematurely deallocated.

I think a more reliable scheme would be to use fds created using shm_open(), immediately unlinking the name with shm_unlink().  Then one could use the existing infrastructure for fd passing and let the operating system handle the reference counting.  This would prevent leaked shared memory (unless the process is killed in between shm_open() and shm_unlink()).  I would like to add something like this to multiprocessing.
msg228511 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-04 23:46
Closing as won't fix.
History
Date User Action Args
2014-10-04 23:46:07pitrousetstatus: open -> closed
resolution: wont fix
messages: + msg228511

stage: needs patch -> resolved
2013-03-31 00:12:37sbtsetmessages: + msg185602
2013-03-30 22:49:38pitrousetnosy: + pitrou, sbt

messages: + msg185597
versions: + Python 3.3, Python 3.4, - Python 3.1, Python 3.2
2010-08-27 16:22:15BreamoreBoysetnosy: jnoller, asksol, schlesin
versions: + Python 3.1, Python 2.7, Python 3.2
type: resource usage
components: + Library (Lib)
stage: needs patch
2010-08-27 14:08:42asksolsetnosy: + asksol
2009-10-10 04:05:38schlesinsetnosy: + schlesin
2009-08-05 20:48:57jnollersetmessages: + msg91335
2009-08-05 20:39:52jnollersetmessages: + msg91334
2009-08-05 20:27:19jnollersetassignee: jnoller
2009-08-05 20:25:01jnollersetmessages: + msg91333
2009-08-05 20:23:50jnollercreate