This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: The 'subprocess' module leaks memory when called in certain ways
Type: resource usage Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: The Compiler, Xavion, r.david.murray, vstinner, ztane
Priority: normal Keywords:

Created on 2016-09-15 04:31 by Xavion, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
Memory-Leak-Test.zip Xavion, 2016-09-17 21:14 All test files - in correct hierarchy
Test.sh Xavion, 2016-09-17 21:16 Memory monitoring script
Test-1.py Xavion, 2016-09-17 21:16 First test case
Test-2.py Xavion, 2016-09-17 21:17 Second test case
Test-1-no-gc.log Xavion, 2016-09-17 21:18 First test results - no garbage collection
Test-1-gc.log Xavion, 2016-09-17 21:19 First test results - garbage collection enforced
Test-2-no-gc.log Xavion, 2016-09-17 21:19 Second test results - no garbage collection
Test-2-gc.log Xavion, 2016-09-17 21:20 Second test results - garbage collection enforced
Test-3.py vstinner, 2016-09-20 07:43
Test-3a.py Xavion, 2016-09-20 22:40 Third test case (revised)
Test-3a-no-gc.log Xavion, 2016-09-20 22:41 Third test results - no garbage collection
Test-3a-gc.log Xavion, 2016-09-20 22:41 Third test results - garbage collection enforced
Messages (19)
msg276514 - (view) Author: Xavion (Xavion) Date: 2016-09-15 04:31
Each time I run a shell command via the 'subprocess' module, I notice that the memory footprint of my program increases by roughly 4 KiB.

I've tested the problem with two different slices of code; the result is the same in either case (long after the function finishes).

Code slice 1:
    check_output("true")

Code slice 2:
    pTest = Popen("true", stdout=PIPE, stderr=PIPE)
    pTest.wait()
    pTest.stdout.close()
    pTest.stderr.close()
    del pTest
    gc.collect()

I'm using Python v3.5.2-1 on Arch Linux; it was installed via the [extra] repository.  Let me know if you need any further information.
msg276515 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-15 05:11
3.5.1+ ubuntu; I run the Popen case in while True, and watch `top` - not a single digit changes in the memory usage (the last digit being the kilobytes). That the memory footprint increases *once* by 4KiB is nothing; please run this in a loop.
msg276517 - (view) Author: Xavion (Xavion) Date: 2016-09-15 05:54
I wouldn't have reported this if it was only happening *once*.  I already have it in a loop; a new shell command is fired every second.

The memory footprint increases by roughly 4 KiB *each* time.  I monitor it via the following Bash script:
    while true; do
        ps -C "python3 ./Program.pyw" -o pid=,%mem=,rss= >> ./Output.log
        sleep 1
    done &

I have attached the logfile for your convenience.  Let me know if you'd like me to run any other tests.
msg276523 - (view) Author: Xavion (Xavion) Date: 2016-09-15 07:33
It's easier to reproduce the issue if you use a timer (rather than a loop).  The newly attached logfile was generated with the following code fragment.

    def fTest() :
        check_output("true")
        threading.Timer(1, fTest, ()).start()
msg276539 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-15 09:24
Ahhah, the title should say: subprocess module leaks 4kiB memory **per thread**.
msg276775 - (view) Author: Xavion (Xavion) Date: 2016-09-17 07:01
Okay, I've modified the title to match what I've written below.

I've just run some further tests on this problem.  The attached archive contains code samples and the output generated (both with and without garbage collection).

As you can see, the memory stays constant in the first case.  In the second, the presence of the loop is probably what throws a spanner in the works.  Garbage collection seems to make the outcome slightly worse (in the second case).

The situation isn't as bad as I first reported, but the memory does nonetheless keep increasing in the second case (which it probably shouldn't).
msg276794 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-17 12:56
Could you post files instead of a zip, please?  It will be easier to review.
msg276850 - (view) Author: Xavion (Xavion) Date: 2016-09-17 21:14
I put them into an archive so that the folder hierarchy would be preserved.  Doing it that way makes it faster for you guys to run the tests at your end.

Nonetheless, I will post the seven (7) files individually as well.  It doesn't look like I can upload more than one at a time, so get ready for a few emails!
msg276950 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-19 06:56
I can't reproduce this with either python3.4.3 or 3.5 or 3.6 tip running it on gentoo linux.  For me it bumps up initially but then remains constant even if I let it run for many more probes than in your example.

I'm not sure what to suggest to you for further debugging this.  It is surprising that Arch would have a different behavior than Gentoo in this context.  If we are lucky maybe someone else will be able to reproduce it.
msg276986 - (view) Author: Xavion (Xavion) Date: 2016-09-19 21:57
Wow, that is surprising (given how simple it is)!  Did you try both tests?  Remember that only the second one produces the bug here.

Let's leave this sit for a while.  If no-one else can reproduce it on their OSs/distributions, I'll seek advice from the Arch community.
msg276987 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-19 22:10
I only ran the second one.  I didn't bother with the first one :)
msg276989 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 22:54
I'm unable to reproduce any memory leak on subprocess itself:
---
import tracemalloc; tracemalloc.start()
import subprocess, gc

def func(loops) :
    for x in range(loops):
        proc = subprocess.Popen(['true'])
        with proc:
            proc.wait()

# warmup
func(10)

gc.collect();gc.collect();gc.collect()
print(tracemalloc.get_traced_memory()[1])

func(100)

gc.collect();gc.collect();gc.collect()
print(tracemalloc.get_traced_memory()[1])

gc.collect();gc.collect();gc.collect()
print(tracemalloc.get_traced_memory()[1])

func(100)

gc.collect();gc.collect();gc.collect()
print(tracemalloc.get_traced_memory()[1])
---

Output on Fedora 24 (Linux) and Python 3.5:
---
996450
996450
996450
996450
---
msg276990 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-19 22:58
No memory leak if subprocess is spawned in a thread neither:
---
import tracemalloc; tracemalloc.start()
import subprocess, threading, time, gc

def spawn(event) :
    subprocess.check_output("true")
    gc.collect(), gc.collect(), gc.collect()
    event.set()

def func(loops):
    event = threading.Event()
    for x in range(loops):
        event.clear()
        timer = threading.Timer(0, spawn, (event,))
        timer.start()
        event.wait()

func(100)

gc.collect();gc.collect();gc.collect();gc.collect()
a = tracemalloc.get_traced_memory()[1]
print("first", a, "B")

loops = 1000
func(loops)

gc.collect();gc.collect();gc.collect();gc.collect()
b = tracemalloc.get_traced_memory()[1]
print("after", loops, "loops, mem:", b, "B")

d = (b-a) / loops
print("diff: %.1f B/loop" % d)

loops = 1000
func(loops)

gc.collect();gc.collect();gc.collect();gc.collect()
c = tracemalloc.get_traced_memory()[1]
print("after", loops, "loops, mem:", c, "B")

d = (c-b) / loops
print("diff2: %.1f B/loop" % d)
---

Output:
---
first 1013738 B
after 1000 loops, mem: 1014266 B
diff: 0.5 B/loop
after 1000 loops, mem: 1014318 B
diff2: 0.1 B/loop
---

Sorry, 0.5 byte/loop is not a memory leak :-)
msg276994 - (view) Author: Xavion (Xavion) Date: 2016-09-20 01:06
What about when you test it using the files I provided?  I didn't want you guys to have to write your own code.

Note that I was monitoring the memory externally (via good old 'ps').  This could make a difference to the outcome.
msg277009 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-20 05:25
If tracemalloc doesn't show any leak but the RSS memory increases, it can
be memory fragmentation or memory alloctions not traced by tracemalloc.
msg277010 - (view) Author: Antti Haapala (ztane) * Date: 2016-09-20 05:48
The title of the issue is still wrong. As I noted before the problem is not with subprocess leaking 4K memory *always*. The issue comes from the fact that subprocess seems to leak 4K memory per individual thread. The test code to use is thus

    def test():
        check_output("true")
        threading.Timer(1, test, ()).start()

    test()

which will invoke subprocess always in a new thread. Using subprocess in a loop, or using the timer as above without subprocess will not increase memory usage.

I have changed the title accordingly
msg277012 - (view) Author: Xavion (Xavion) Date: 2016-09-20 06:12
haypo: So, what is the result when you run "Test-2.py" and monitor the memory usage with "Test.sh"?

ztane: The code you've provided is the same as "Test-1.py".  You need to run "Test-2.py" in order to see the bug!
msg277015 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-20 07:43
Test-2.py has issues:

* it doesn't call Timer.join()
* it uses a weak synchronization between the main thread and the Timer thread: see msg276990 for an example using Event

If you use a better synchronization code, call timer.join() and call gc.collect(), the memory usage is very stable even after creating more than 100 000 threads + subprocesses.

Xavion: "As I noted before the problem is not with subprocess leaking 4K memory *always*. The issue comes from the fact that subprocess seems to leak 4K memory per individual thread."

I'm unable to reproduce a *leak*.

Test-3.py output:
----
497 thread+subprocess
VmRSS:	    9584 kB
986 thread+subprocess
VmRSS:	    9596 kB      <== after the warmup, the usage seems stable
1490 thread+subprocess
VmRSS:	    9596 kB
(...)
10361 thread+subprocess
VmRSS:	    9596 kB
(...)
30282 thread+subprocess
VmRSS:	    9596 kB
30695 thread+subprocess
VmRSS:	    9672 kB
31160 thread+subprocess
VmRSS:	    9684 kB      <=== memory usage decreases :-)
(...)
60768 thread+subprocess
VmRSS:	    9684 kB
^C
---

If you really want to say that something is wrong: I don't understand why we must call gc.collect() to keep the memory usage stable. But I guess that the GC is not always called for performance.

Without the GC it's not that bad:
---
1083 thread+subprocess
VmRSS:	    9764 kB
2097 thread+subprocess
VmRSS:	    9888 kB
3136 thread+subprocess
VmRSS:	    9888 kB
(...)
11750 thread+subprocess
VmRSS:	    9888 kB
12668 thread+subprocess
VmRSS:	    9940 kB
13705 thread+subprocess
VmRSS:	    9940 kB
(...)
70948 thread+subprocess
VmRSS:	    9940 kB
^C
---

There is no such "4k leak per function call".

I close the issue. It's a bug in your code, not in Python.
msg277077 - (view) Author: Xavion (Xavion) Date: 2016-09-20 22:40
Firstly, you've misquoted me.  The quote you attributed to me in your latest post was actually made by 'ztane'.

Secondly, your extra thread/event code makes no difference here.  I will attach the memory usage logs in subsequent posts.

For consistency, I have removed all of the collateral stuff from your "Test-3.py" script and reattached it here as "Test-3a.py".
History
Date User Action Args
2022-04-11 14:58:36adminsetgithub: 72352
2016-09-20 22:41:54Xavionsetfiles: + Test-3a-gc.log
2016-09-20 22:41:14Xavionsetfiles: + Test-3a-no-gc.log
2016-09-20 22:40:14Xavionsetstatus: closed -> open
files: + Test-3a.py
title: The 'subprocess' module leaks 4 kiB memory for each thread -> The 'subprocess' module leaks memory when called in certain ways
messages: + msg277077

resolution: not a bug ->
2016-09-20 07:43:53vstinnersetstatus: open -> closed
files: + Test-3.py
resolution: not a bug
messages: + msg277015
2016-09-20 06:12:23Xavionsetmessages: + msg277012
2016-09-20 05:48:15ztanesetmessages: + msg277010
title: The 'subprocess' module leaks memory when called in certain ways -> The 'subprocess' module leaks 4 kiB memory for each thread
2016-09-20 05:25:39vstinnersetmessages: + msg277009
2016-09-20 01:06:56Xavionsetmessages: + msg276994
2016-09-19 22:58:35vstinnersetmessages: + msg276990
2016-09-19 22:54:14vstinnersetmessages: + msg276989
2016-09-19 22:20:37vstinnersetnosy: + vstinner
2016-09-19 22:10:04r.david.murraysetmessages: + msg276987
2016-09-19 21:57:15Xavionsetmessages: + msg276986
2016-09-19 06:56:30r.david.murraysetmessages: + msg276950
2016-09-17 21:20:00Xavionsetfiles: + Test-2-gc.log
2016-09-17 21:19:30Xavionsetfiles: + Test-2-no-gc.log
2016-09-17 21:19:10Xavionsetfiles: + Test-1-gc.log
2016-09-17 21:18:49Xavionsetfiles: + Test-1-no-gc.log
2016-09-17 21:17:19Xavionsetfiles: + Test-2.py
2016-09-17 21:16:46Xavionsetfiles: + Test-1.py
2016-09-17 21:16:12Xavionsetfiles: + Test.sh
2016-09-17 21:14:53Xavionsetfiles: + Memory-Leak-Test.zip

messages: + msg276850
2016-09-17 20:51:40Xavionsetfiles: - Memory-Leak-Test.zip
2016-09-17 12:56:44r.david.murraysetnosy: + r.david.murray
messages: + msg276794
2016-09-17 07:01:52Xavionsetfiles: + Memory-Leak-Test.zip

messages: + msg276775
title: The 'subprocess' module leaks roughly 4 KiB of memory per call -> The 'subprocess' module leaks memory when called in certain ways
2016-09-17 06:22:27Xavionsetfiles: - Output.log
2016-09-15 20:30:57ppperrysettype: resource usage
2016-09-15 09:24:16ztanesetmessages: + msg276539
2016-09-15 08:42:09The Compilersetnosy: + The Compiler
2016-09-15 07:33:41Xavionsetfiles: + Output.log

messages: + msg276523
2016-09-15 07:29:58Xavionsetfiles: - Output.log
2016-09-15 05:54:28Xavionsetfiles: + Output.log

messages: + msg276517
2016-09-15 05:11:55ztanesetnosy: + ztane
messages: + msg276515
2016-09-15 04:31:15Xavioncreate