Issue 42536: Iterating on a zip keeps objects alive longer than expected (test_itertools leaks sometimes references)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86702

classification

Title:	Iterating on a zip keeps objects alive longer than expected (test_itertools leaks sometimes references)
Type:		Stage:	resolved
Components:	Tests	Versions:	Python 3.10

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	brandtbucher, pablogsal, rhettinger, serhiy.storchaka, tim.peters, vstinner
Priority:	normal	Keywords:	patch

Created on 2020-12-02 11:48 by vstinner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_zip_leak.py	vstinner, 2020-12-02 23:43
test_zip_leak2.py	vstinner, 2020-12-03 09:16

Pull Requests
URL	Status	Linked	Edit
PR 23623	merged	brandtbucher, 2020-12-03 04:32
PR 23651	merged	brandtbucher, 2020-12-05 04:55
PR 23652	merged	brandtbucher, 2020-12-05 04:57

Messages (39)
msg382304 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 11:48
12:07:06 vstinner@apu$ ./python -m test test_itertools -R 3:3 -F -j4 0:00:00 load avg: 0.44 Run tests in parallel using 4 child processes (...) 0:02:03 load avg: 4.24 [ 1] test_itertools passed (2 min 2 sec) -- running: test_itertools (2 min 3 sec), test_itertools (2 min 3 sec), test_itertools (2 min 3 sec) beginning 6 repetitions 123456 ...... 0:02:03 load avg: 4.24 [ 2/1] test_itertools failed (2 min 3 sec) -- running: test_itertools (2 min 3 sec), test_itertools (2 min 3 sec) beginning 6 repetitions 123456 ...... test_itertools leaked [32, 32, 32] references, sum=96 test_itertools leaked [24, 22, 22] memory blocks, sum=68 Kill <TestWorkerProcess #1 running test=test_itertools pid=159366 time=2 min 3 sec> process group Kill <TestWorkerProcess #2 running test=test_itertools pid=159586 time=164 ms> process group Kill <TestWorkerProcess #3 running test=test_itertools pid=159371 time=2 min 3 sec> process group == Tests result: FAILURE == 1 test OK. 1 test failed: test_itertools Total duration: 2 min 3 sec Tests result: FAILURE
msg382305 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 11:49
Leak seen on AMD64 RHEL7 Refleaks 3.x: https://buildbot.python.org/all/#builders/562/builds/109 Recent itertools change, I have no idea if it's related: commit cc061d0e6fb2569efa91531686f75b89e94ec865 Author: Raymond Hettinger <rhettinger@users.noreply.github.com> Date: Mon Nov 30 20:42:54 2020 -0800 bpo-38200: Add itertools.pairwise() (GH-23549)
msg382340 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 21:06
> nosy: + rhettinger, serhiy.storchaka I wanted to bisect this test before annoying too many people with this random issue. So far, I managed to simplified test_itertools to: def test_main(verbose=None): test_classes = ( TestBasicOps, TestVariousIteratorArgs, TestGC, ) support.run_unittest(*test_classes)
msg382342 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-12-02 21:17
You can get a list of all tests with --list-cases and then run them in separate processes.
msg382344 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-12-02 21:32
Thanks. I'll work on it this evening. If this is new, pairwise() is the most likely cause.
msg382356 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 23:43
I reproduce the leak with attached test_zip.py by running: $ ./python -m test -R 3:3 test_zip_leak -j4 -m test.test_zip_leak.TestGC.test_permutations -m test.test_zip_leak.TestGC.test_zip The problem is that the builtin zip class keeps an internal tuple alive. See zip_next) and zipobject.result member (tuple).
msg382357 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 23:50
Raymond Hettinger: > If this is new, pairwise() is the most likely cause. I reproduced the issue without involving the itertools module. It seems to be a zip_next() micro-optimization which keeps an object alive longer than epxected. This micro-optimization is not new. It's just really hard to trigger the issue. It seems to depend if the zip object is deleted or not. The builtin zip type implements a traverse function which visits Py_VISIT(lz->result). The GC is supposed to be able to break such cycle, no?
msg382358 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-02 23:53
> The GC is supposed to be able to break such cycle, no? In zip_new(), lz->result tuple is not always tracked by the GC.
msg382360 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2020-12-03 00:13
Adding Brandt because he recently worked on the zip() code.
msg382361 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 00:59
It looks like the GC untracks the None-filled result tuple in untrack_tuples, and it's never re-tracked again. This can also happen if it's filled with atomic values on an early iteration and the GC visits it. Perhaps a simple fix is to call _PyObject_GC_TRACK on the result tuple inside of each zip_next call if it's not tracked anymore? if (!_PyObject_GC_IS_TRACKED(result)) { _PyObject_GC_TRACK(result); } _PyTuple_MaybeUntrack(result); // Worth it? return result; Although I'm not sure how much of a win we're getting from the result tuple reuse here - maybe it's easier to just not do that.
msg382362 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 01:14
Simple demo: >>> import gc >>> gc.disable() >>> z = zip([[]]) >>> gc.is_tracked(next(z)) True >>> z = zip([[]]) >>> gc.collect() 0 >>> gc.is_tracked(next(z)) False
msg382365 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 02:02
Some quick benchmarks on a normal build with CPU isolation, tuned with pyperf. No PGO/LTO. $ ./python -m pyperf timeit --rigorous --setup 'i = (None,) * 10_000_000' 'for _, _ in zip(i, i): pass' # Current master. ......................................... Mean +- std dev: 279 ms +- 11 ms $ ./python -m pyperf timeit --rigorous --setup 'i = (None,) * 10_000_000' 'for _, _ in zip(i, i): pass' # With above fix. ......................................... Mean +- std dev: 369 ms +- 20 ms $ ./python -m pyperf timeit --rigorous --setup 'i = (None,) * 10_000_000' 'for _, _ in zip(i, i): pass' # With above fix (no _PyTuple_MaybeUntrack). ......................................... Mean +- std dev: 284 ms +- 17 ms $ ./python -m pyperf timeit --rigorous --setup 'i = (None,) * 10_000_000' 'for _, _ in zip(i, i): pass' # With no tuple reuse. ......................................... Mean +- std dev: 526 ms +- 51 ms Note that this example reuses the result tuple for every iteration.
msg382374 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 05:03
It looks like Victor's original issue is unrelated to zip, though. That test run is clean after adding the same fix to: - itertools.product - itertools.combinations - itertools.combinations_with_replacement - itertools.permutations - itertools.zip_longest ...all of which use the same tuple-recycling speed trick. If my zip PR looks good, I can create another one fixing these as well.
msg382375 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 05:06
(By the way, I didn't know that -F runs the tests forever... so I was waiting almost forever for it to finish!)
msg382376 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 05:10
Also, it appears enumerate is affected as well.
msg382387 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2020-12-03 08:57
Why there are cycles at all? In normal case there should not be cycle and the result tuple should be destroyed when the iteration ends.
msg382388 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-03 08:59
Extract of Brandt's PR: // The GC may have untracked this result tuple if its elements were all // untracked. Since we're recycling it, make sure it's tracked again: if (!_PyObject_GC_IS_TRACKED(result)) { _PyObject_GC_TRACK(result); } I would like to understand why the tuple is no longer tracked, whereas PyTuple_New() creates a newly created tuple which is tracked. Using gdb, I found that gc_collect_main() calls untrack_tuples(young) which untracks all tuples of the young generation. I understand that (when the issue happens): * a zip() object is created with lz->result = (None, None) * A GC collection happens * The GC untracks (None, None) tuple * next(zip) is called: lz->result has a reference count of 1 and so can be reused. Problem: the tuple is no longer tracked, whereas its content changed and so the newly filled tuple might be part of a reference cycle. Since the tuple is not tracked, the GC can no longer break the reference cycle involving the zip object internal tuple. Example of code where the zip tuple is untracked before zip_next() is called on the zip object: def test_product(self): gc.set_threshold(5) pools = [(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), ('a', 'b', 'c'), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)] indices = [0, 2, 10, 11] print(indices) print(pools) list(None for a, b in zip(pools, indices))
msg382391 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-03 09:16
test_zip_leak2.py: simplified example to trigger the GC issue. Copy it to Lib/test/ and reproduce the leak with: ./python -m test -R 3:3 test_zip_leak2 Extract of the code: --- container = [] iterator = zip([container], [container]) # untrack the internal zip->result=(None, None) gc.collect() # create a reference cycle: container -> iterator -> container container.append(iterator) next(iterator) # zip->result=(container, container) del container, iterator # Try to break the reference cycle gc.collect() ---
msg382392 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-03 09:22
Serhiy Storchaka: > Why there are cycles at all? In normal case there should not be cycle and the result tuple should be destroyed when the iteration ends. test_itertools creates a reference cycle on purpose. See test_zip_leak2.py simplified code, it comes from test_itertools.
msg382395 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-03 09:23
I add Pablo and Tim who love GC puzzles.
msg382396 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-03 09:34
The zip bug is quite old and was already tested by test_itertools for a very long time. Suddenly, it started to fail on more and more Refleaks buildbots. It may be because the GC must be triggered at a very specific point, and the default GC threshold triggers the issue with latest test_itertools changes.
msg382418 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 15:30
> I add Pablo and Tim who love GC puzzles. Well, I don’t think the GC is really doing anything wrong here... untracking these sort of tuples likely results in a noticeable reduction in collection times. This code is definitely testing the limits of what is “okay” for us to do with a tuple, and it’s breaking the GC’s (reasonable) assumptions. I kept digging around, and it appears dict.items() is affected, too (which is a bit scarier). The same fix still works, though.
msg382424 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 16:16
functools.reduce looks affected, too.
msg382448 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 20:19
I added some comments in the PR regarding the possibility of forcing the tracking on the visiting function to redirect the cost to gc passes instead of when calling next() but there is another option: not untracking tuples that have refcount == 1.
msg382450 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 20:24
> not untracking tuples that have refcount == 1. Actually, this may not be a good idea as well: running test_list shows that there are 14786 tuples with refcount == 1 that would have been untracked. This is a non-trivial workload to the gc passes that we will lose if we do this. Performance-wise, I think the best alternative is to force the tracking before visiting in tp_traverse.
msg382451 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 20:27
> I added some comments in the PR regarding the possibility of forcing the tracking on the visiting function to redirect the cost to gc passes instead of when calling next()... Yep, I agree that this approach is better. > ...but there is another option: not untracking tuples that have refcount == 1. I do like the idea of not having to fix this in ~10 different places, but I'm not sure this is enough. The GC visit may happen when the refcount of the result tuple is > 1, but it may drop back to 1 before __next__ called again z = zip([42, []]) item = next(zip) gc.collect() # Refcount of item == 2, untracked by GC. del item # Refcount of item == 1, ready to be recycled. next(zip) # This should be tracked, but won't be.
msg382452 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 20:34
> The GC visit may happen when the refcount of the result tuple is > 1, but it may drop back to 1 before __next__ called again Good point. Also, is it possible that a more complex cycle ends having more than one reference to that result, so indeed that is not enough.
msg382455 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 22:13
> I added some comments in the PR regarding the possibility of forcing the tracking on the visiting function Thinking about this more, I'm a bit hesitant to put the re-tracking code in the traverse function (mostly stemming from my lack of knowledge about our GC algorithm). I have no idea how problematic tracking and untracking objects during collections can be. Consider the following reference chain: -> z -> r -> o -> r Here, z is a zip iterator, r is its result tuple, and o is some object. Let's say r was untracked in the previous collection (I believe untrack_tuples happens after all of the traverse functions are called). I'm worried that the GC may not be able to detect the cycle if it visits o, then z (which then starts tracking r), then r. Or, if this group is determined to be unreachable, untrack_tuples untracks r before the cycle is cleaned up. That seems like it could be a problem. Or, if z's refcount drops to zero and it's cleaned up, its traverse function may never be called, which leaves the untracked r -> o -> r cycle. I guess I'm just not sure how resilient the GC is to these sorts of things. Re-tracking the result tuple in __next__ sidesteps all of these potential issues, with no significant performance impact. So I think I might slightly prefer that.
msg382457 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 23:32
> Or, if z's refcount drops to zero and it's cleaned up, its traverse function may never be called, which leaves the untracked r -> o -> r cycle. This is a real problem, indeed. We would need to add the tracking to the tp_dealloc of the zip object as well. > I have no idea how problematic tracking and untracking objects during collections can be. It is indeed tricky: the most problematic part of the "surprise tracking" is the validity of the pointers, but the key here is that the traverse function will be called BEFORE (update_refs()) the pointers start being invalid (because of the bit reuse). > Or, if this group is determined to be unreachable, untrack_tuples untracks r before the cycle is cleaned up. That seems like it could be a problem. It is not: untrack_tuples is called over the reachable set after the gc knows what part of the linked list is isolated garbage. If the group is in the unreachable set, then untrack_tuples won't touch it. > I'm worried that the GC may not be able to detect the cycle if it visits o, then z (which then starts tracking r), then r. This is not a problem IMO: if the gc does not see the cycle in that collection, it will in the next ones. Wat will happen is that at the beginning the gc does not see r so it iterates over the linked list of the generation being collected, decrementing the gc refcounts. When it reaches z, it starts decrementing the refcounts of z and that calls the traverse function of z, that tracks r AND decrements its gc refcounts but also it adds "r" to the end of the linked list of the young generation. This means that (assuming we are collecting the middle generation) in this collection the cycle will not be seen, but in a future collection where the young generation is collected then "r" will be visited normally (notice that the gc can see "r" without needing to visit "z" before). Also, notice that untrack_tuples() is called over the reachable set, but "r" is in the young generation so it won't be untracked. Also, if we are collecting the younger generation, then the cycle will be found and it will be cleaned nicely.
msg382458 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 23:34
I forgot to mention the conclusion: my conclusion is that although the GC is resilient to these shenanigans (it was also extensive validation along with the algorithm), I still prefer to track it on the next() call because is technically the right thing to do: A tuple has been mutated so it must be (potentially) tracked because it needs to behave "AS IF" the tuple was being created from scratch.
msg382459 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-03 23:51
Thanks for that detailed explanation, Pablo. If nobody objects to the current zip fix, I'll merge it and move on to similar PRs for the other affected stuff: - dict.items - enumerate - functools.reduce - itertools.product - itertools.combinations - itertools.combinations_with_replacement - itertools.permutations - itertools.zip_longest
msg382460 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-03 23:54
> I'll merge it and move on to similar PRs for the other affected stuff: Not sure if is more ergonomic, but I would prefer to fix this in one PR so the buildbots don't test the partial fixes. Also is easier to revert if something goes wrong with the approach as that would affect all. But being said that, if you prefer multiple ones, I am also fine :)
msg382461 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-04 00:18
Yeah, I'm fine with that.
msg382462 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-04 00:44
Adding: - collections.OrderedDict.items
msg382476 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-04 11:13
Would it be possible to push a quick & dirty workaround for test_itertools? I'm getting more and more buildbot Refleaks failures. For example, always skip TestGC.test_zip() for now. This issue seems quite complex and I would prefer to have time to think about it to design a proper fix, rather than having to work under pressure.
msg382478 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-12-04 11:30
> Would it be possible to push a quick & dirty workaround for test_itertools? I'm getting more and more buildbot Refleaks failures. Oh, I looked at PR 23623 and it got many approvals. I'm fine with waiting for a few days until PR 23623 is merged.
msg382552 - (view)	Author: Brandt Bucher (brandtbucher) *	Date: 2020-12-05 03:46
New changeset 226a012d1cd61f42ecd3056c554922f359a1a35d by Brandt Bucher in branch 'master': bpo-42536: GC track recycled tuples (GH-23623) https://github.com/python/cpython/commit/226a012d1cd61f42ecd3056c554922f359a1a35d
msg382678 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-07 20:07
New changeset 60463e8e4f79e5b5e96dc43fb83ded373b489e33 by Brandt Bucher in branch '3.9': bpo-42536: GC track recycled tuples (GH-23623) (GH-23651) https://github.com/python/cpython/commit/60463e8e4f79e5b5e96dc43fb83ded373b489e33
msg382679 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-12-07 20:08
New changeset 7c797982383ebfd9cca39c480dcf6132979dd06f by Brandt Bucher in branch '3.8': bpo-42536: GC track recycled tuples (GH-23623) (GH-23652) https://github.com/python/cpython/commit/7c797982383ebfd9cca39c480dcf6132979dd06f

History
Date	User	Action	Args
2022-04-11 14:59:38	admin	set	github: 86702
2020-12-07 20:08:37	pablogsal	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2020-12-07 20:08:28	pablogsal	set	messages: + msg382679
2020-12-07 20:07:52	pablogsal	set	messages: + msg382678
2020-12-05 04:57:03	brandtbucher	set	pull_requests: + pull_request22520
2020-12-05 04:55:18	brandtbucher	set	pull_requests: + pull_request22519
2020-12-05 03:46:00	brandtbucher	set	messages: + msg382552
2020-12-04 11:30:27	vstinner	set	messages: + msg382478
2020-12-04 11:13:41	vstinner	set	messages: + msg382476
2020-12-04 00:44:15	brandtbucher	set	messages: + msg382462
2020-12-04 00:18:22	brandtbucher	set	messages: + msg382461
2020-12-03 23:54:49	pablogsal	set	messages: + msg382460
2020-12-03 23:51:36	brandtbucher	set	messages: + msg382459
2020-12-03 23:34:18	pablogsal	set	messages: + msg382458
2020-12-03 23:32:26	pablogsal	set	messages: + msg382457
2020-12-03 22:13:17	brandtbucher	set	messages: + msg382455
2020-12-03 20:34:37	pablogsal	set	messages: + msg382452
2020-12-03 20:27:56	brandtbucher	set	messages: + msg382451
2020-12-03 20:24:22	pablogsal	set	messages: + msg382450
2020-12-03 20:19:31	pablogsal	set	messages: + msg382448
2020-12-03 16:16:33	brandtbucher	set	messages: + msg382424
2020-12-03 15:30:57	brandtbucher	set	messages: + msg382418
2020-12-03 09:34:11	vstinner	set	messages: + msg382396
2020-12-03 09:23:59	vstinner	set	nosy: + tim.peters, pablogsal messages: + msg382395
2020-12-03 09:22:50	vstinner	set	messages: + msg382392
2020-12-03 09:16:40	vstinner	set	files: + test_zip_leak2.py messages: + msg382391
2020-12-03 08:59:50	vstinner	set	messages: + msg382388
2020-12-03 08:57:06	serhiy.storchaka	set	messages: + msg382387
2020-12-03 05:10:36	brandtbucher	set	messages: + msg382376
2020-12-03 05:06:38	brandtbucher	set	messages: + msg382375
2020-12-03 05:03:44	brandtbucher	set	messages: + msg382374
2020-12-03 04:32:59	brandtbucher	set	keywords: + patch stage: patch review pull_requests: + pull_request22491
2020-12-03 02:02:03	brandtbucher	set	messages: + msg382365
2020-12-03 01:14:54	brandtbucher	set	messages: + msg382362
2020-12-03 00:59:11	brandtbucher	set	messages: + msg382361
2020-12-03 00:20:29	rhettinger	set	assignee: rhettinger ->
2020-12-03 00:13:57	rhettinger	set	nosy: + brandtbucher messages: + msg382360
2020-12-02 23:53:43	vstinner	set	messages: + msg382358
2020-12-02 23:50:25	vstinner	set	messages: + msg382357
2020-12-02 23:45:17	vstinner	set	title: test_itertools leaks sometimes references -> Iterating on a zip keeps objects alive longer than expected (test_itertools leaks sometimes references)
2020-12-02 23:43:56	vstinner	set	files: + test_zip_leak.py messages: + msg382356
2020-12-02 21:32:03	rhettinger	set	assignee: rhettinger messages: + msg382344
2020-12-02 21:17:39	serhiy.storchaka	set	messages: + msg382342
2020-12-02 21:06:04	vstinner	set	messages: + msg382340
2020-12-02 12:49:40	serhiy.storchaka	set	nosy: + rhettinger, serhiy.storchaka
2020-12-02 11:49:26	vstinner	set	messages: + msg382305
2020-12-02 11:48:35	vstinner	create