Issue 31217: test_code leaked [1, 1, 1] memory blocks on x86 Gentoo Refleaks 3.6/3.x

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/75400

classification

Title:	test_code leaked [1, 1, 1] memory blocks on x86 Gentoo Refleaks 3.6/3.x
Type:	resource usage	Stage:	resolved
Components:	Tests	Versions:	Python 3.7, Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	brett.cannon, pitrou, vstinner, zach.ware
Priority:	normal	Keywords:

Created on 2017-08-16 09:41 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_leak.py	vstinner, 2017-08-31 19:02
leak.py	vstinner, 2017-08-31 22:26
test_haypoleak.py	pitrou, 2017-08-31 22:35
leak2.py	vstinner, 2017-08-31 22:55

Pull Requests
URL	Status	Linked	Edit
PR 3258	closed	vstinner, 2017-09-01 00:41
PR 3260	merged	vstinner, 2017-09-01 09:50
PR 3261	merged	vstinner, 2017-09-01 12:31

Messages (24)
msg300344 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-16 09:41
http://buildbot.python.org/all/builders/x86%20Gentoo%20Refleaks%203.6/builds/58/steps/test/logs/stdio test_code leaked [1, 1, 1] memory blocks, sum=3 (...) Re-running test 'test_code' in verbose mode (...) test_code leaked [1, 1, 1] memory blocks, sum=3 I'm unable to reproduce the bug: haypo@selma$ ./python -m test -R 3:3 test_code (...) 1 test OK.
msg300442 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-17 15:48
It's even worse, many tests fail with 1 to 3 memory blocks on x86 Gentoo Refleaks 3.x... but then pass when run again. I created bpo-31227 "regrtest: reseed random with the same seed before running a test file" which might help (or not?) this issue. http://buildbot.python.org/all/builders/x86%20Gentoo%20Refleaks%203.x/builds/62 test_sys leaked [1, 1, 1] memory blocks, sum=3 test_robotparser leaked [1, 1, 1] memory blocks, sum=3 test_nntplib leaked [1, 1, 1] memory blocks, sum=3 test_tools leaked [3, 2, 1] memory blocks, sum=6 test_fork1 leaked [1, 1, 1] memory blocks, sum=3 test_poll leaked [1, 1, 1] memory blocks, sum=3 test_asyncio leaked [2, 1, 1] memory blocks, sum=4 Re-running failed tests in verbose mode Re-running test 'test_sys' in verbose mode Re-running test 'test_robotparser' in verbose mode Re-running test 'test_nntplib' in verbose mode Re-running test 'test_tools' in verbose mode test_tools leaked [1, 3, 1] memory blocks, sum=5 Re-running test 'test_fork1' in verbose mode Re-running test 'test_poll' in verbose mode Re-running test 'test_asyncio' in verbose mode
msg300443 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-17 15:51
regrtest also has an issue: when tests are run again, they should be run in a fresh process, especially when tests are run in parallel (-jN).
msg300689 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-22 13:06
I succeeded to reproduce a leak on the x86 Gentoo Refleaks 3.x buildbot using the following command. I'm unable to reproduce the bug on my Fedora 25. pydev@stormageddon ~/cpython $ ./python -m test -R 3:3 -j1 -m test.test_code.CoExtra.test_free_different_thread test_code -F Run tests in parallel using 1 child processes 0:00:00 load avg: 2.26 [ 1/1] test_code failed beginning 6 repetitions 123456 ...... test_code leaked [1, 1, 1] memory blocks, sum=3 0:00:01 load avg: 2.26 [ 2/2] test_code failed beginning 6 repetitions 123456 ...... test_code leaked [1, 1, 1] memory blocks, sum=3 2 tests failed: test_code test_code Total duration: 2 sec Tests result: FAILURE
msg301049 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 14:30
Another command to reproduce a leak: ./python -m test -R 3:3 -j1 test_sys -F -m test.test_sys.SysModuleTest.test_current_frames
msg301051 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 16:56
Current status of my analysis. I'm able to reproduce the bug using this modified Lib/test/test_sys.py: --- import unittest, test.support from test.support.script_helper import assert_python_ok import threading def f123(): pass class SysModuleTest(unittest.TestCase): def test_current_frames(self): t = threading.Thread(target=f123) t.start() t.join() --- And this shell script: --- set -x while true; do PYTHONHASHSEED=$RANDOM ./python -m test -R 3:3 -j1 test_sys -m test_current_frames \|\| break; done --- With this command, when the test fail, it's possible to reproduce the bug 100% of times using the written PYTHONHASHSEED. Example: --- pydev@stormageddon ~/cpython $ ./bug.sh ++ true ++ PYTHONHASHSEED=23016 ++ ./python -m test -R 3:3 -j1 test_sys -m test_current_frames Run tests in parallel using 1 child processes 0:00:00 load avg: 0.15 [1/1] test_sys passed beginning 6 repetitions 123456 ...... 1 test OK. Total duration: 982 ms Tests result: SUCCESS ++ true ++ PYTHONHASHSEED=9197 ++ ./python -m test -R 3:3 -j1 test_sys -m test_current_frames Run tests in parallel using 1 child processes 0:00:00 load avg: 0.22 [1/1/1] test_sys failed beginning 6 repetitions 123456 ...... test_sys leaked [1, 1, 1] memory blocks, sum=3 1 test failed: test_sys Total duration: 1000 ms Tests result: FAILURE ++ break pydev@stormageddon ~/cpython $ PYTHONHASHSEED=9197 ./python -m test -R 3:3 -j1 test_sys -m test_current_frames Run tests in parallel using 1 child processes 0:00:00 load avg: 0.20 [1/1/1] test_sys failed beginning 6 repetitions 123456 ...... test_sys leaked [1, 1, 1] memory blocks, sum=3 1 test failed: test_sys Total duration: 987 ms Tests result: FAILURE ---
msg301054 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 19:01
Ok, I succeeded to reproduce the issue on my amd64 laptop using gcc -m32. I'm now sure that the issue can be reproduced on 32-bit. Maybe the issue can be reproduced on 64-bit but is more unlikely?
msg301055 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 19:02
Copy attached test_leak.py to Lib/test/ and run: set -x while true; do PYTHONHASHSEED=$RANDOM ./python -m test -R 3:30 -j1 test_leak \|\| break; done
msg301059 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 20:19
I created bpo-31317: Memory leak in dict with shared keys.
msg301069 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 22:26
I'm still able to reproduce the bug using attached leak.py script which is now 100 lines long, but has to import logging, socket and pickle. Without these imports, the bug hides (maybe the hardcoded hash values must be recomputed, but I failed to find hash values in that case).
msg301070 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-31 22:35
Attached test script still fails even though the objects are pre-allocated, and without setting a hash seed. $ ./python -m test -R 3:6 test_haypoleak Run tests sequentially 0:00:00 load avg: 0.23 [1/1] test_haypoleak beginning 9 repetitions 123456789 ......... test_haypoleak leaked [1, 1, 1, 1, 1, 1] memory blocks, sum=6 test_haypoleak failed
msg301072 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 22:55
leak2.py: script based on my leak.py and Antoine's test_haypoleak.py. If you replace "if 0:" with "if 1:", it seems like the test doesn't anymore. It seems like the "leak" is the call to sys.getallocatedblocks() which creates a new integer, and the integer is kept alive between two loop iterations. Maybe I missed something.
msg301073 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-31 23:22
> It seems like the "leak" is the call to sys.getallocatedblocks() which creates a new integer, and the integer is kept alive between two loop iterations. I doubt it. If that was the case, the reference count would increase as well.
msg301074 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-31 23:33
Actually, leak2.py doesn't attempt to cleanup anything between runs, so it can't be reliable for detecting leaks. You lack dash_R_cleanup() somewhere.
msg301075 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-08-31 23:48
Antoine: "I doubt it. If that was the case, the reference count would increase as well." The bug is really weird :-) Antoine: "Actually, leak2.py doesn't attempt to cleanup anything between runs, so it can't be reliable for detecting leaks. You lack dash_R_cleanup() somewhere." I simplified dash_R_cleanup() and at the end, it was empty :-) I don't see how the test_current_frames() would need to clear any kind of cache. Maybe: * sys._clear_type_cache() * gc.collect() * C function clear_freelists() of gcmodule.c But I tried to call these functions, and it doesn't change anything. I don't see how calling set.add() and set.discard() would impact any cache, except of maybe of a free list? The strange thing is that calling dangling.clear() explicitly in test_current_frames() "fixes the leak" (hides the bug?). But any tiny change on this file also hides the bug. The script is very fragile.
msg301076 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-08-31 23:58
Ahah. Actually, it's quite simple :-) On 64-bit Python: >>> id(82914 - 82913) == id(1) True On 32-bit Python: >>> id(82914 - 82913) == id(1) False So the first non-zero alloc_delta really has a snowball effect, as it creates new memory block which will produce a non-zero alloc_delta on the next run, etc.
msg301077 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-09-01 00:03
This patch does the trick: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index efe5210..68e490a 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -48,6 +48,11 @@ def dash_R(the_module, test, indirect_test, huntrleaks): print("beginning", repcount, "repetitions", file=sys.stderr) print(("1234567890"*(repcount//10 + 1))[:repcount], file=sys.stderr, flush=True) + + int_pool = {i: i for i in range(-1000, 1000)} + def get_pooled_int(v): + return int_pool.setdefault(v, v) + # initialize variables to make pyflakes quiet rc_before = alloc_before = fd_before = 0 for i in range(repcount): @@ -56,9 +61,9 @@ def dash_R(the_module, test, indirect_test, huntrleaks): abcs) print('.', end='', file=sys.stderr, flush=True) if i >= nwarmup: - rc_deltas[i] = rc_after - rc_before - alloc_deltas[i] = alloc_after - alloc_before - fd_deltas[i] = fd_after - fd_before + rc_deltas[i] = get_pooled_int(rc_after - rc_before) + alloc_deltas[i] = get_pooled_int(alloc_after - alloc_before) + fd_deltas[i] = get_pooled_int(fd_after - fd_before) alloc_before = alloc_after rc_before = rc_after fd_before = fd_after
msg301079 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 00:32
> So the first non-zero alloc_delta really has a snowball effect, as it creates new memory block which will produce a non-zero alloc_delta on the next run, etc. Oh, I suspected an issue around this code but I was unable to explain it. I focused strongly on test_current_frames(), whereas this function is just fine... It's really strange that the bug only triggers on very specific conditions. """ Actually, it's quite simple :-) On 64-bit Python: >>> id(82914 - 82913) == id(1) True On 32-bit Python: >>> id(82914 - 82913) == id(1) False """ That's very strange. Another workaround: diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py index efe52107cb..35d3f8e42d 100644 --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -56,9 +56,10 @@ def dash_R(the_module, test, indirect_test, huntrleaks): abcs) print('.', end='', file=sys.stderr, flush=True) if i >= nwarmup: - rc_deltas[i] = rc_after - rc_before - alloc_deltas[i] = alloc_after - alloc_before - fd_deltas[i] = fd_after - fd_before + def maybe_small_long(x): return int(str(x)) + rc_deltas[i] = maybe_small_long(rc_after - rc_before) + alloc_deltas[i] = maybe_small_long(alloc_after - alloc_before) + fd_deltas[i] = maybe_small_long(fd_after - fd_before) alloc_before = alloc_after rc_before = rc_after fd_before = fd_after
msg301081 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 00:50
Ok, I wrote a PR for my str(int(x)) workaround: https://github.com/python/cpython/pull/3258 I tested manually on the buildbot that it fixes all bugs listed on this issue.
msg301082 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 00:58
+ int_pool = {i: i for i in range(-1000, 1000)} + def get_pooled_int(v): + return int_pool.setdefault(v, v) I'm not sure that I understand this code. It makes sure that you get a single object in memory for the same integer, and not only for Python "small integer singletons"? About the -1000..1000 range: if a function leaks more than 1 memory block or more than 1 reference, there is already something wrong no? Maybe it's ok to only care of values -1, 0 and 1 :-) My code only cares of Python small integer singletons. I have no strong preference between my code or yours. I only care of fixing the buildbot :-) What do you prefer, Antoine?
msg301090 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2017-09-01 09:18
> It makes sure that you get a single object in memory for the same integer, and not only for Python "small integer singletons"? Yes. > About the -1000..1000 range: if a function leaks more than 1 memory block or more than 1 reference, there is already something wrong no? Well, the current code can still hide it, for example if a function leaks [1000, 0, 1000, 1000, 1000] references, the code says it's ok.
msg301094 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 11:05
New changeset 6c2feabc5dac2f3049b15134669e9ad5af573193 by Victor Stinner in branch 'master': bpo-31217: Fix regrtest -R for small integer (#3260) https://github.com/python/cpython/commit/6c2feabc5dac2f3049b15134669e9ad5af573193
msg301100 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 13:04
New changeset 98c849a2f32f6727239b4cce38b8f0ff8adeef22 by Victor Stinner in branch '3.6': bpo-31217: Fix regrtest -R for small integer (#3260) (#3261) https://github.com/python/cpython/commit/98c849a2f32f6727239b4cce38b8f0ff8adeef22
msg301101 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-09-01 13:13
Ok, this issue was really insane. Thank you very much Antoine Pitrou for explaning me the root bug ("int - int") and for the fix. The bug is now fixed in Python 3.6 and master (3.7). I checked manually the applied fix on the Gentoo Refleak 3.x buildbot using test_code and test_sys run in a loop: no failure after 30 runs. Hopefully, it was a bug in the test, not in the code ;-) The fix should help to detect real memory leaks, and maybe we can now replace again all() with any() in libregrtest/refleak.py to detect more leaks: see bpo-30776.

History
Date	User	Action	Args
2022-04-11 14:58:50	admin	set	github: 75400
2017-09-01 13:13:53	vstinner	set	status: open -> closed resolution: fixed messages: + msg301101 stage: resolved
2017-09-01 13:04:49	vstinner	set	messages: + msg301100
2017-09-01 12:31:33	vstinner	set	pull_requests: + pull_request3304
2017-09-01 11:05:29	vstinner	set	messages: + msg301094
2017-09-01 09:50:51	vstinner	set	pull_requests: + pull_request3303
2017-09-01 09:18:26	pitrou	set	messages: + msg301090
2017-09-01 00:58:58	vstinner	set	messages: + msg301082
2017-09-01 00:50:05	vstinner	set	messages: + msg301081
2017-09-01 00:41:47	vstinner	set	pull_requests: + pull_request3302
2017-09-01 00:32:20	vstinner	set	messages: + msg301079
2017-09-01 00:03:58	pitrou	set	messages: + msg301077
2017-08-31 23:58:13	pitrou	set	messages: + msg301076
2017-08-31 23:48:23	vstinner	set	messages: + msg301075
2017-08-31 23:33:00	pitrou	set	messages: + msg301074
2017-08-31 23:22:33	pitrou	set	messages: + msg301073
2017-08-31 22:55:11	vstinner	set	files: + leak2.py messages: + msg301072
2017-08-31 22:35:51	pitrou	set	files: + test_haypoleak.py messages: + msg301070
2017-08-31 22:26:35	vstinner	set	files: + leak.py nosy: + pitrou messages: + msg301069
2017-08-31 22:23:00	vstinner	link	issue31317 superseder
2017-08-31 20:19:48	vstinner	set	messages: + msg301059
2017-08-31 19:03:21	zach.ware	set	nosy: + zach.ware
2017-08-31 19:02:02	vstinner	set	files: + test_leak.py messages: + msg301055
2017-08-31 19:01:17	vstinner	set	messages: + msg301054
2017-08-31 16:56:19	vstinner	set	messages: + msg301051
2017-08-31 14:30:09	vstinner	set	messages: + msg301049
2017-08-22 13:06:50	vstinner	set	messages: + msg300689
2017-08-21 09:42:22	vstinner	set	versions: + Python 3.7
2017-08-21 09:42:17	vstinner	set	title: test_code leaked [1, 1, 1] memory blocks on x86 Gentoo Refleaks 3.6 -> test_code leaked [1, 1, 1] memory blocks on x86 Gentoo Refleaks 3.6/3.x
2017-08-17 15:51:25	vstinner	set	messages: + msg300443
2017-08-17 15:48:08	vstinner	set	messages: + msg300442
2017-08-17 12:31:12	pitrou	set	nosy: + brett.cannon
2017-08-16 09:41:40	vstinner	create