Issue44895
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-08-12 00:13 by iritkatriel, last changed 2022-04-11 14:59 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
cpython-1628938551.dump.gz | corona10, 2021-08-14 10:59 | |||
refleak.py | iritkatriel, 2021-08-16 20:13 | |||
strict_refleak_check.patch | vstinner, 2021-09-01 16:43 |
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 27746 | merged | iritkatriel, 2021-08-12 16:59 | |
PR 27753 | merged | miss-islington, 2021-08-13 09:41 | |
PR 27761 | merged | iritkatriel, 2021-08-13 15:24 | |
PR 27767 | merged | corona10, 2021-08-14 11:01 | |
PR 27778 | merged | miss-islington, 2021-08-16 08:36 | |
PR 28113 | merged | vstinner, 2021-09-01 15:26 |
Messages (71) | |||
---|---|---|---|
msg399424 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 00:13 | |
For background see https://bugs.python.org/issue33930#msg399403 iritkatriel@Irits-MBP cpython % repeat 10 ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory 0:00:00 load avg: 2.32 Run tests sequentially 0:00:00 load avg: 2.32 [1/1] test_exceptions beginning 6 repetitions 123456 ...... == Tests result: SUCCESS == 1 test OK. Total duration: 5.9 sec Tests result: SUCCESS 0:00:00 load avg: 2.22 Run tests sequentially 0:00:00 load avg: 2.22 [1/1] test_exceptions beginning 6 repetitions 123456 ...... == Tests result: SUCCESS == 1 test OK. Total duration: 5.8 sec Tests result: SUCCESS 0:00:00 load avg: 2.20 Run tests sequentially 0:00:00 load avg: 2.20 [1/1] test_exceptions beginning 6 repetitions 123456 ...... == Tests result: SUCCESS == 1 test OK. Total duration: 5.8 sec Tests result: SUCCESS 0:00:00 load avg: 2.17 Run tests sequentially 0:00:00 load avg: 2.17 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 4] references, sum=16 test_exceptions leaked [6, 6, 4] memory blocks, sum=16 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 5.9 sec Tests result: FAILURE 0:00:00 load avg: 2.08 Run tests sequentially 0:00:00 load avg: 2.08 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 5.8 sec Tests result: FAILURE 0:00:00 load avg: 2.39 Run tests sequentially 0:00:00 load avg: 2.39 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 6.0 sec Tests result: FAILURE 0:00:00 load avg: 2.36 Run tests sequentially 0:00:00 load avg: 2.36 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 6.0 sec Tests result: FAILURE 0:00:00 load avg: 2.31 Run tests sequentially 0:00:00 load avg: 2.31 [1/1] test_exceptions beginning 6 repetitions 123456 ...... == Tests result: SUCCESS == 1 test OK. Total duration: 6.3 sec Tests result: SUCCESS 0:00:00 load avg: 2.20 Run tests sequentially 0:00:00 load avg: 2.20 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 6.1 sec Tests result: FAILURE 0:00:00 load avg: 2.35 Run tests sequentially 0:00:00 load avg: 2.35 [1/1] test_exceptions |
|||
msg399449 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-12 13:44 | |
I fail to reproduce the issue on Linux on the 3.10 branch. I interrupted the job after 20 minutes: $ ./python -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory -F -j1 (...) 0:21:38 load avg: 2.01 [134] test_exceptions passed beginning 6 repetitions 123456 ...... I tested the commit 5d444434ad4e1943a88c9d3aadd300fd0f05dab7. |
|||
msg399450 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 13:59 | |
I see it on Mac but not windows. Running the test with -v or adding prints makes the leak go away. I’m suspecting that one of the recursion tests fails to clear an exception or something like that. Removing any one of the tests makes it stop failing. |
|||
msg399451 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 14:02 | |
Victor, if the …cycle2 test is hanging then you need the change in PR27626. |
|||
msg399452 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-12 14:17 | |
I am not more lucky in the main branch. I ran it for 30 minutes and it didn't fail. $ ./python -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory -F -j1 (...) 0:31:15 load avg: 1.88 [184] test_exceptions passed beginning 6 repetitions 123456 ...... ^C Kill <TestWorkerProcess #1 running test=test_exceptions pid=173796 time=4.2 sec> process group == Tests result: INTERRUPTED == Test suite interrupted by signal SIGINT. 184 tests OK. Total duration: 31 min 19 sec Tests result: INTERRUPTED I tested commit 8ac0886091c27bf4b6bb0a9b571e174b554d31a4. |
|||
msg399453 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-12 14:18 | |
> Victor, if the …cycle2 test is hanging then you need the change in PR27626. That's the commit d5c217475c4957a8084ac3f92ae012ece5edc7cb. My main branch is up to date (commit 8ac0886091c27bf4b6bb0a9b571e174b554d31a4), it includes the commit d5c217475c4957a8084ac3f92ae012ece5edc7cb. |
|||
msg399456 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 14:21 | |
I don't know why you need to kill it. It's not a long test. |
|||
msg399457 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-12 14:23 | |
> I see it on Mac but not windows. Aha, maybe there is something specific on Mac which triggers an exact code path to trigger to bug. I already saw bugs like that. You can try to play with gc.set_threshold() to change how garbage collection is triggered. |
|||
msg399459 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 14:39 | |
> I don't know why you need to kill it. It's not a long test. Ah I see, -F. |
|||
msg399463 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 14:52 | |
The problem disappears if I add a gc.collect() loop at the beginning of the new test: diff --git a/Lib/test/test_exceptions.py b/Lib/test/test_exceptions.py index 79798ecf05..e0aeac9d10 100644 --- a/Lib/test/test_exceptions.py +++ b/Lib/test/test_exceptions.py @@ -1014,6 +1014,7 @@ def cycle(): def test_no_hang_on_context_chain_cycle2(self): # See issue 25782. Cycle at head of context chain. + while gc.collect(): pass class A(Exception): pass |
|||
msg399472 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 17:37 | |
This fails (test_recursion_normalizing_with_no_memory is not needed): ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler |
|||
msg399485 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-12 22:41 | |
My smallest failing example so far: def test_recursion_in_except_handler(self): self.test_no_hang_on_context_chain_cycle2() def set_relative_recursion_limit(n): depth = 1 while True: try: sys.setrecursionlimit(depth) except RecursionError: depth += 1 else: break sys.setrecursionlimit(depth+n) def recurse_in_body_and_except(): try: recurse_in_body_and_except() except: recurse_in_body_and_except() recursionlimit = sys.getrecursionlimit() try: set_relative_recursion_limit(10) try: recurse_in_body_and_except() except RecursionError: pass else: self.fail("Should have raised a RecursionError") finally: sys.setrecursionlimit(recursionlimit) |
|||
msg399492 - (view) | Author: Chris Jerdonek (chris.jerdonek) * | Date: 2021-08-13 05:50 | |
Out of curiosity, is the failure deterministic in environments where it fails? If not, what is the source of the indeterminism -- some kind of race condition or something else? |
|||
msg399507 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 08:48 | |
It’s not deterministic. |
|||
msg399510 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2021-08-13 09:41 | |
New changeset 7bf28cbb4bf37fa6bdfc2d3f8a3939066b3f8f22 by Irit Katriel in branch 'main': bpo-44895: Temporarily add an extra gc.collect() call (GH-27746) https://github.com/python/cpython/commit/7bf28cbb4bf37fa6bdfc2d3f8a3939066b3f8f22 |
|||
msg399513 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 09:56 | |
> what is the source of the indeterminism -- some kind of race condition or something else? My hunch is it could be timing of GC. But it's hard to say at this point. |
|||
msg399516 - (view) | Author: miss-islington (miss-islington) | Date: 2021-08-13 10:21 | |
New changeset ebc59262349d6901b825ed9101d604e826757262 by Miss Islington (bot) in branch '3.10': bpo-44895: Temporarily add an extra gc.collect() call (GH-27746) https://github.com/python/cpython/commit/ebc59262349d6901b825ed9101d604e826757262 |
|||
msg399517 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2021-08-13 10:22 | |
I merged the workaround to get refleak buildbots unstuck for the weekend. I bumped the issue stage back to "needs patch" as the available one isn't a solution. |
|||
msg399533 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-13 12:59 | |
Chris: > Out of curiosity, is the failure deterministic in environments where it fails? If not, what is the source of the indeterminism -- some kind of race condition or something else? The GC uses counters and thresholds to decide which collection and when they should be collected. I tried to play with gc.set_threshold(), but I failed to reproduce the issue on Linux. Maybe it's a very precise threshold which triggers the issue. Between Linux and macOS, the site module executes different code paths which produce different GC counters. Sometimes, the GC must happen in a very precise line, one line later is too late. See bpo-44422 for a case of threading.enumerate() (different kind of bug, but it's related to the GC). See bpo-44184 for another example of GC bug which only occurred on Windows and only if you type an exact command line. |
|||
msg399537 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-13 13:59 | |
I have tried to reproduce this on my Linux and Mac machines and I have been unable to do it for the time being in either of them. |
|||
msg399539 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 15:10 | |
The script below hangs on my mac (it's an extract from test_recursion_in_except_handler). ----------- import sys count = 0 def main(): def f(): global count count += 1 try: f() except RecursionError: f() sys.setrecursionlimit(30) try: f() except RecursionError: pass main() print(count) ----------- When I kill it the traceback shows it alternating between the two recursive calls, but not in a regular pattern: ... [snipped a lot] File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ [Previous line repeated 2 more times] RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ [Previous line repeated 1 more time] RecursionError: maximum recursion depth exceeded During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/iritkatriel/src/cpython/tt.py", line 22, in <module> main() ^^^^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 18, in main f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ [Previous line repeated 10 more times] File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 11, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 13, in f f() ^^^ File "/Users/iritkatriel/src/cpython/tt.py", line 7, in f def f(): KeyboardInterrupt |
|||
msg399540 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 15:11 | |
Adding Mark since he worked on recursion recently. |
|||
msg399541 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-13 15:21 | |
From my macOS, I was not able to reproduce the issue also as like as Victor and Pablo. repeat 10 ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory However hang was able to reproduce. |
|||
msg399542 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-13 15:21 | |
> However hang was able to reproduce. I meant msg399539 |
|||
msg399543 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-13 15:30 | |
Oops sorry. Without PR 27746, I was able to reproduce on my macOS environment. test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) |
|||
msg399546 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 17:16 | |
Regarding the hang in msg399539, do we need some special handling of the case were a RecursionError is being raised and its context is another RecursionError? Like a FatalError? |
|||
msg399554 - (view) | Author: Chris Jerdonek (chris.jerdonek) * | Date: 2021-08-13 19:39 | |
> Maybe it's a very precise threshold which triggers the issue. Between Linux and macOS, the site module executes different code paths which produce different GC counters. > Sometimes, the GC must happen in a very precise line, one line later is too late. How does this explain it not being non-deterministic on, say, macOS since the same lines of code will be executing each time? Is there a way to force things to happen in a deterministic fashion? (I interpreted the responses to mean that it's non-deterministic even when run on the same machine, so I'm not talking about, say, the differences from running macOS on different machines.) |
|||
msg399555 - (view) | Author: Chris Jerdonek (chris.jerdonek) * | Date: 2021-08-13 19:43 | |
"How does this explain it not being non-deterministic on" -> "How does this explain it not being deterministic on" |
|||
msg399556 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-13 19:53 | |
For instance, if memory addresses of objects are different, so objects sort differently in different runs on the same system. |
|||
msg399557 - (view) | Author: Chris Jerdonek (chris.jerdonek) * | Date: 2021-08-13 20:26 | |
Last question: might trying different values of PYTHONHASHSEED help? |
|||
msg399560 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-13 20:48 | |
If someone can reproduce more or less reliably, it would be great to know where are the leaked objects. For that, compile CPython with "-with-trace-refs", surround the suspicious test in a loop (test_no_hang_on_context_chain_cycle2 for example) of a 1000 or more iterations and run the test with the PYTHONDUMPREFS=1 environment variable. You will see the leaked objects at shutdown hopefully. Hopefully the leaked objects would be a gigantic set by then so we the number of references will be noticeable. |
|||
msg399578 - (view) | Author: Chris Jerdonek (chris.jerdonek) * | Date: 2021-08-14 00:57 | |
FYI, I tried myself, and setting PYTHONHASHSEED didn't make the failures deterministic. |
|||
msg399588 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-14 10:59 | |
Here is the dump I extracted. 0:00:00 load avg: 2.89 Run tests sequentially 0:00:00 load avg: 2.89 [1/1] test_exceptions beginning 6 repetitions 123456 ...... test_exceptions leaked [6, 6, 6] references, sum=18 test_exceptions leaked [6, 6, 6] memory blocks, sum=18 test_exceptions failed (reference leak) == Tests result: FAILURE == 1 test failed: test_exceptions 1 re-run test: test_exceptions Total duration: 5.7 sec Tests result: FAILURE |
|||
msg399589 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-14 11:03 | |
GH-27767 is a tooling patch for extract dump easily. See cpython-1628938551.dump.gz for msg399588. |
|||
msg399601 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-14 22:20 | |
I created issue44917 for the recursion hang because it seems like it's really another bug. |
|||
msg399636 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2021-08-16 08:36 | |
New changeset 62bc716fde20fcb7b47416c7959be9e66df93212 by Irit Katriel in branch 'main': bpo-44895: skip test_no_hang_on_context_chain_cycle2 until the refleak is fixed (GH-27761) https://github.com/python/cpython/commit/62bc716fde20fcb7b47416c7959be9e66df93212 |
|||
msg399639 - (view) | Author: miss-islington (miss-islington) | Date: 2021-08-16 09:01 | |
New changeset 1960409a6dcbd1e3527f02b523bd27df9086dd77 by Miss Islington (bot) in branch '3.10': bpo-44895: skip test_no_hang_on_context_chain_cycle2 until the refleak is fixed (GH-27761) https://github.com/python/cpython/commit/1960409a6dcbd1e3527f02b523bd27df9086dd77 |
|||
msg399676 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 20:13 | |
I've attached an even shorter script (refleak.py) that reproduces the leak. |
|||
msg399681 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 21:05 | |
The leak seems to be due to the nested definitions of the Exception types A and B. It goes away if I move them out of the function so that it's like: class A(Exception): pass class B(Exception): pass def test_no_hang_on_context_chain_cycle2(): try: try: raise A() except A as _a: a = _a try: raise B() except B as _b: b = _b raise b except B: pass |
|||
msg399682 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 21:11 | |
(Pdb) test_no_hang_on_context_chain_cycle2.__code__.co_consts[1] <code object A at 0x10eecc840, file "/Users/iritkatriel/src/cpython-1/tt.py", line 37> (Pdb) sys.getrefcount(test_no_hang_on_context_chain_cycle2.__code__.co_consts[1]) 2 (Pdb) gc.get_referrers(test_no_hang_on_context_chain_cycle2.__code__.co_consts[1]) [] |
|||
msg399696 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 22:37 | |
I've reproduced the leak in 3.10 (with the reflinks.py script). In 3.9 the double-recursion segfaults: Fatal Python error: _Py_CheckRecursiveCall: Cannot recover from stack overflow. Python runtime state: initialized |
|||
msg399697 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 22:42 | |
I meant refleaks.py, not reflinks.py. |
|||
msg399700 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-16 23:03 | |
It's midnight, so just one last weirdness before I turn into a pumpkin: If I add this assignment to x the issue seems to go away (or become less frequent so I don't see it?) But if I assign x to a small constant it still leaks. def test_no_hang_on_context_chain_cycle2(): x = list(range(100)) class A(Exception): pass class B(Exception): pass |
|||
msg399703 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-16 23:09 | |
>> x = list(range(100)) Is creating a lot of objects, and the gc has a chance to run on every object creation so maybe what's happening there is that you are inadvertently triggering the gc |
|||
msg399739 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 10:50 | |
However, if I replace the assignment with a gc.collect() call, it still leaks: def test_no_hang_on_context_chain_cycle2(): while gc.collect(): pass class A(Exception): pass class B(Exception): pass ... I also changed gc_collect in Lib/test/support/__init__.py (which is called by the refleak check) to loop until gc.collect() returns 0 (currently it calls gc.collect() 3 times). Still leaks. |
|||
msg399740 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 10:58 | |
But if I also add a gc loop at the beginning of test_recursion_in_except_handler, the leak goes away. def test_recursion_in_except_handler(): while gc.collect(): pass def set_relative_recursion_limit(n): ... So maybe test_recursion_in_except_handler is resurrecting something that test_no_hang_on_context_chain_cycle2 has let go of? |
|||
msg399762 - (view) | Author: Dong-hee Na (corona10) * | Date: 2021-08-17 15:53 | |
New changeset c2c857b40f226575d64e0d56a759cbd799f51e62 by Dong-hee Na in branch 'main': bpo-44895: Introduce PYTHONDUMPREFSFILE variable for refcount dumping (GH-27767) https://github.com/python/cpython/commit/c2c857b40f226575d64e0d56a759cbd799f51e62 |
|||
msg399764 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 16:07 | |
I'm pretty sure the frame locals is involved. This leaks (note that a is being re-raised, not b): --------------------- def test_no_hang_on_context_chain_cycle2(): class A(Exception): pass class B(Exception): pass try: try: raise A() except A as _a: a = _a try: raise B() except B as _b: b = _b raise a except A: pass --------------------- But this doesn't leak: --------------------- def test_no_hang_on_context_chain_cycle2(): class A(Exception): pass class B(Exception): pass try: try: raise A() except A as _a: a = _a try: raise B() except B as _b: pass # b = _b <== not saving b in a local raise a except A: pass --------------------- |
|||
msg399781 - (view) | Author: Guido van Rossum (gvanrossum) * | Date: 2021-08-17 18:55 | |
So there's code generated for an except clause with a variable that clears the variable at the end, so that this works: try: ... except Exception as err: ... print(err) # Should fail, since err was cleared when the except clause exited. Maybe there's a bug here, either in the runtime or in the way that the code is generated? |
|||
msg399785 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 20:02 | |
It's probably not that because this leaks too, and it skips the cleanup: try: raise B() except B: b = sys.exc_info()[1] |
|||
msg399786 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 20:04 | |
But this makes it not leak again, so now we know it's the traceback: try: raise B() except B: b = sys.exc_info()[1] b.__traceback__ = None |
|||
msg399788 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-17 20:20 | |
I am still amazed that I cannot reproduce the leaks on my machine. Maybe I am missing some steps. Could you outline how are you reproducing the leaks? |
|||
msg399789 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-17 20:21 | |
> But this makes it not leak again, so now we know it's the traceback: The traceback is a very heavy gc object because it contains the frames, and in the locals of the frame is the traceback itself via the exception. This is a well known heavy cycle, but the gc should take care of it if the cycle becomes isolated |
|||
msg399792 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 20:59 | |
This doesn't leak: try: raise B() except B: tb = sys.exc_info()[1].__traceback__ So this isn't about the traceback being part of the cycle. The B() needs to be there too. |
|||
msg399794 - (view) | Author: Mark Shannon (Mark.Shannon) * | Date: 2021-08-17 21:07 | |
refleak.py seems to run forever for me. Got bored after 6000 iterations. Which O/S and configure options are needed for refleak.py to fail? |
|||
msg399795 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 21:11 | |
I'm using a mac. It typically fails within at most 10-20 iterations. |
|||
msg399796 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-17 21:16 | |
> Could you outline how are you reproducing the leaks? Sorry I missed this. Just run the script: iritkatriel@Irits-MBP cpython-1 % ./python.exe refleak.py -- 1 -- beginning 6 repetitions 123456 ...... -- 2 -- beginning 6 repetitions 123456 ...... -- 3 -- beginning 6 repetitions 123456 ...... test leaked [2, 4, 2] references, sum=8 test leaked [2, 4, 2] memory blocks, sum=8 |
|||
msg399800 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2021-08-17 22:34 | |
There is some timing-related race condition at hand because I tried in a Linux machine and a MacBook Pro (intel) laptop and I wasn't able to reproduce after 20 min of the script running :( |
|||
msg399855 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-18 16:43 | |
Irit: do you still reproduce the issue using gc.disable()? Or you can try different values to call gc.set_threshold(). You may also try different values in ns.huntrleaks, like: ns.huntrleaks = (3, 20, 'tt_out.py'). With 20 iterations, do you still reproduce the leak? To make the issue more likely, you can change check_rc_deltas() in Lib/test/libregrtest/refleak.py. Try: def check_rc_deltas(deltas): return any(delta >= 1 for delta in deltas) With this change, the script stops immediately: test leaked [-4, 4, -4] references, sum=-4 test leaked [-4, 4, -4] memory blocks, sum=-4 Using 20 iterations, I get: test leaked [-4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4] references, sum=0 test leaked [-4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4] memory blocks, sum=0 |
|||
msg399856 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-18 16:48 | |
> test leaked [-4, 4, -4] references, sum=-4 If I disable the GC, the script no longer detects "leaks". Same using a GC threshold of 800. Using a GC threshold between 100 and 700, I reproduce the [-4, 4, -4] references leak. Funny bug. |
|||
msg399857 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-18 16:49 | |
You can also play with the clear() method of frame objects and traceback.clear_frames() (which expects a traceback object). |
|||
msg400645 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-30 18:14 | |
I think the leak happens when we invoke GC while the recursion limit is exceeded. It goes way if make this change: def recurse_in_body_and_except(): try: recurse_in_body_and_except() except RecursionError as e: gc.disable() recurse_in_body_and_except() gc.enable() I also added a __del__ to B and saw it being called when recursion limit is small (though adding the __del__ makes the leak go away. Actually just adding "def f(): pass" is enough to make it go away). |
|||
msg400741 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-31 15:04 | |
A variant of this problem: bpo-36560 with a leak 1 memory block per iteration. |
|||
msg400757 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-31 16:36 | |
Victor - I am also getting a feeling that this may not be a real leak but rather a bug in the refleaks test bookkeeping. It could be a problem with the way _Py_RefTotal is incremented/decremented. |
|||
msg400779 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-08-31 21:20 | |
> It could be a problem with the way _Py_RefTotal is incremented/decremented. It can be a cache which is kept alive and then clear the following iteration. See my previous comment: test leaked [-4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4, -4, 4] references, sum=0 libregrtest/refleak.py does its best to clear *all* caches, type caches, free lists, etc. A single cache which is not cleared can confused libregrtest. |
|||
msg400794 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-08-31 23:02 | |
Oh, I see now what you meant. I think you're right that it's cleanup. I changed dash_R_cleanup to put the sys._clear_type_cache() after clear_caches() and the leak is gone: @@ -166,9 +166,6 @@ def dash_R_cleanup(fs, ps, pic, zdc, abcs): zipimport._zip_directory_cache.clear() zipimport._zip_directory_cache.update(zdc) - # clear type cache - sys._clear_type_cache() - # Clear ABC registries, restoring previously saved ABC registries. abs_classes = [getattr(collections.abc, a) for a in collections.abc.__all__] abs_classes = filter(isabstract, abs_classes) @@ -181,6 +178,12 @@ def dash_R_cleanup(fs, ps, pic, zdc, abcs): clear_caches() + # clear type cache + sys._clear_type_cache() + + support.gc_collect() + I did this because I suspected it's something related to types, because it only happen when we define the exception classes (A and B) inside the test function. So maybe the type objects of these exceptions are not cleared from the type cache because they have refcount > 0 but by the time clear_caches() and gc.collect() finish they are cleared? Note this doesn't mean the change I made is always right - maybe the whole body of dash_R_cleanup should be in a loop until nothing changes? |
|||
msg400840 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-01 15:06 | |
> I changed dash_R_cleanup to put the sys._clear_type_cache() after clear_caches() and the leak is gone Aha! The order of function calls in dash_R_cleanup() matters. Calling sys._clear_type_cache() at the end of dash_R_cleanup() perfectly makes sense. If you do that, please add a comment to explain that the order matters. For example, the code about ABC classes is likely to modify classes. About the gc_collect() call, I dislike calling it in clear_caches() and then again in dash_R_cleanup(). I would even prefer to *remove* it from both functions and call it explicitly at the call site: in dash_R() and in runtest.py. |
|||
msg400844 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-01 15:45 | |
New changeset 679cb4781ea370c3b3ce40d3334dc404d7e9d92b by Victor Stinner in branch 'main': bpo-44895: libregrtest: refleak check clears types later (GH-28113) https://github.com/python/cpython/commit/679cb4781ea370c3b3ce40d3334dc404d7e9d92b |
|||
msg400848 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-09-01 15:56 | |
With your patch I don't see a leak for refleak.py, but I still see one for ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory -F -j1 (once I remove the @skip on test_no_hang_on_context_chain_cycle2). However, if I make this change it looks better: --- a/Lib/test/libregrtest/refleak.py +++ b/Lib/test/libregrtest/refleak.py @@ -90,8 +90,9 @@ def get_pooled_int(value): for i in rep_range: test_func() - dash_R_cleanup(fs, ps, pic, zdc, abcs) - support.gc_collect() + for i in range(3): + dash_R_cleanup(fs, ps, pic, zdc, abcs) + support.gc_collect() |
|||
msg400853 - (view) | Author: STINNER Victor (vstinner) * | Date: 2021-09-01 16:13 | |
> (once I remove the @skip on test_no_hang_on_context_chain_cycle2). Oh. I forgot that the test is still skipped! > With your patch I don't see a leak for refleak.py, but I still see one for > ./python.exe -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory -F -j1 If I apply attached strict_refleak_check.patch and I remove @unittest.skip("See issue 44895") of test_no_hang_on_context_chain_cycle2(), I get: $ ./python -m test -R 3:3 test_exceptions -m test_no_hang_on_context_chain_cycle2 -m test_recursion_normalizing_infinite_exception -m test_recursion_in_except_handler -m test_recursion_normalizing_with_no_memory (...) test_exceptions leaked [6, -6, 6] references, sum=6 (...) |
|||
msg400854 - (view) | Author: Irit Katriel (iritkatriel) * | Date: 2021-09-01 16:18 | |
I changed it to + for i in range(2): + dash_R_cleanup(fs, ps, pic, zdc, abcs) + support.gc_collect() (loop twice instead of 3 times) and that seems fine too. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:48 | admin | set | github: 89058 |
2021-09-01 16:43:27 | vstinner | set | files: + strict_refleak_check.patch |
2021-09-01 16:18:25 | iritkatriel | set | messages: + msg400854 |
2021-09-01 16:13:43 | vstinner | set | messages: + msg400853 |
2021-09-01 15:56:22 | iritkatriel | set | messages: + msg400848 |
2021-09-01 15:45:35 | vstinner | set | messages: + msg400844 |
2021-09-01 15:26:25 | vstinner | set | keywords:
+ patch pull_requests: + pull_request26554 |
2021-09-01 15:06:03 | vstinner | set | messages: + msg400840 |
2021-08-31 23:02:17 | iritkatriel | set | messages: + msg400794 |
2021-08-31 21:20:29 | vstinner | set | messages: + msg400779 |
2021-08-31 16:36:38 | iritkatriel | set | messages: + msg400757 |
2021-08-31 15:04:34 | vstinner | set | messages: + msg400741 |
2021-08-30 18:14:36 | iritkatriel | set | messages: + msg400645 |
2021-08-18 16:49:48 | vstinner | set | messages: + msg399857 |
2021-08-18 16:48:12 | vstinner | set | messages: + msg399856 |
2021-08-18 16:43:00 | vstinner | set | messages: + msg399855 |
2021-08-17 22:34:16 | pablogsal | set | messages: + msg399800 |
2021-08-17 21:16:31 | iritkatriel | set | messages: + msg399796 |
2021-08-17 21:11:21 | iritkatriel | set | messages: + msg399795 |
2021-08-17 21:07:14 | Mark.Shannon | set | messages: + msg399794 |
2021-08-17 20:59:10 | iritkatriel | set | messages: + msg399792 |
2021-08-17 20:21:45 | pablogsal | set | messages: + msg399789 |
2021-08-17 20:20:20 | pablogsal | set | messages: + msg399788 |
2021-08-17 20:04:47 | iritkatriel | set | messages: + msg399786 |
2021-08-17 20:02:35 | iritkatriel | set | messages: + msg399785 |
2021-08-17 18:55:09 | gvanrossum | set | nosy:
+ gvanrossum messages: + msg399781 |
2021-08-17 16:08:12 | iritkatriel | set | messages: - msg399763 |
2021-08-17 16:07:56 | iritkatriel | set | messages: + msg399764 |
2021-08-17 16:05:33 | iritkatriel | set | messages: + msg399763 |
2021-08-17 15:53:03 | corona10 | set | messages: + msg399762 |
2021-08-17 10:58:33 | iritkatriel | set | messages: + msg399740 |
2021-08-17 10:50:19 | iritkatriel | set | messages: + msg399739 |
2021-08-16 23:09:02 | pablogsal | set | messages: + msg399703 |
2021-08-16 23:03:04 | iritkatriel | set | keywords:
- patch messages: + msg399700 |
2021-08-16 22:42:27 | iritkatriel | set | messages: + msg399697 |
2021-08-16 22:37:09 | iritkatriel | set | messages:
+ msg399696 versions: + Python 3.10 |
2021-08-16 21:11:56 | iritkatriel | set | messages: + msg399682 |
2021-08-16 21:05:37 | iritkatriel | set | messages: + msg399681 |
2021-08-16 20:13:38 | iritkatriel | set | files:
+ refleak.py messages: + msg399676 |
2021-08-16 09:01:18 | miss-islington | set | messages: + msg399639 |
2021-08-16 08:36:57 | lukasz.langa | set | messages: + msg399636 |
2021-08-16 08:36:56 | miss-islington | set | pull_requests: + pull_request26250 |
2021-08-14 22:20:44 | iritkatriel | set | messages: + msg399601 |
2021-08-14 11:03:21 | corona10 | set | messages: + msg399589 |
2021-08-14 11:01:36 | corona10 | set | pull_requests: + pull_request26243 |
2021-08-14 10:59:54 | corona10 | set | files:
+ cpython-1628938551.dump.gz messages: + msg399588 |
2021-08-14 00:57:18 | chris.jerdonek | set | messages: + msg399578 |
2021-08-13 20:48:28 | pablogsal | set | messages: + msg399560 |
2021-08-13 20:26:21 | chris.jerdonek | set | messages: + msg399557 |
2021-08-13 19:53:58 | iritkatriel | set | messages: + msg399556 |
2021-08-13 19:43:42 | chris.jerdonek | set | messages: + msg399555 |
2021-08-13 19:39:52 | chris.jerdonek | set | messages: + msg399554 |
2021-08-13 17:16:59 | iritkatriel | set | messages: + msg399546 |
2021-08-13 15:30:56 | corona10 | set | messages: + msg399543 |
2021-08-13 15:24:48 | iritkatriel | set | stage: needs patch -> patch review pull_requests: + pull_request26237 |
2021-08-13 15:21:45 | corona10 | set | messages: + msg399542 |
2021-08-13 15:21:11 | corona10 | set | messages: + msg399541 |
2021-08-13 15:11:28 | iritkatriel | set | nosy:
+ Mark.Shannon messages: + msg399540 |
2021-08-13 15:10:26 | iritkatriel | set | messages: + msg399539 |
2021-08-13 13:59:30 | pablogsal | set | nosy:
+ pablogsal messages: + msg399537 |
2021-08-13 12:59:45 | vstinner | set | messages: + msg399533 |
2021-08-13 10:22:36 | lukasz.langa | set | messages:
+ msg399517 stage: patch review -> needs patch |
2021-08-13 10:21:29 | miss-islington | set | messages: + msg399516 |
2021-08-13 09:56:28 | iritkatriel | set | messages: + msg399513 |
2021-08-13 09:41:53 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request26229 |
2021-08-13 09:41:39 | lukasz.langa | set | nosy:
+ lukasz.langa messages: + msg399510 |
2021-08-13 08:48:16 | iritkatriel | set | messages: + msg399507 |
2021-08-13 05:50:28 | chris.jerdonek | set | nosy:
+ chris.jerdonek messages: + msg399492 |
2021-08-12 22:41:28 | iritkatriel | set | messages: + msg399485 |
2021-08-12 17:37:31 | iritkatriel | set | messages: + msg399472 |
2021-08-12 16:59:53 | iritkatriel | set | keywords:
+ patch stage: patch review pull_requests: + pull_request26224 |
2021-08-12 14:52:17 | iritkatriel | set | messages: + msg399463 |
2021-08-12 14:39:45 | iritkatriel | set | messages: + msg399459 |
2021-08-12 14:23:43 | vstinner | set | messages: + msg399457 |
2021-08-12 14:21:37 | iritkatriel | set | messages: + msg399456 |
2021-08-12 14:18:27 | vstinner | set | messages: + msg399453 |
2021-08-12 14:17:02 | vstinner | set | messages: + msg399452 |
2021-08-12 14:02:39 | iritkatriel | set | messages: + msg399451 |
2021-08-12 13:59:55 | iritkatriel | set | messages: + msg399450 |
2021-08-12 13:44:23 | vstinner | set | messages: + msg399449 |
2021-08-12 11:00:13 | vstinner | set | nosy:
+ vstinner |
2021-08-12 09:10:34 | corona10 | set | nosy:
+ corona10 |
2021-08-12 00:13:38 | iritkatriel | create |