New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refleak in typing.py #72835
Comments
Looks like we have a refleak somewhere in the system that typing.py is exposing. The following code exposes the refleak: def test_refleak(self):
T = typing.TypeVar('T')
class C(typing.Generic[T], typing.Mapping[int, str]): ... The change that first triggered the leak is 8f0df4db2b06. I'm not an expert in typing.py, but I suspect that the real bug must be somewhere in CPython core. Guido, do you have any idea how 8f0df4db2b06 could trigger this? |
Could it be related to leaks in objects with empty __slots__? See bpo-24379 for details. |
Not sure... Do we have a test/patch for objects-with-empty-slots bug? |
Ivan, this was an early typing.py update (merged into CPython on Sept. 27). But the refleak is still with us. Do you have any intuition on what could be going on here? My own intuition here is lacking, other than thinking there's a lot of metaclass magic going on in this code... |
>>> import gc
>>> gc.collect()
23
>>> gc.collect()
0
>>> class A:
... __slots__ = ()
...
>>> del A
>>> gc.collect()
2
^ leak |
Serhiy, why is |
Attached patch (slots_gc.patch) makes objects with empty slots tracked by GC: class A: __slots__ = ()
gc.is_tracked(A()) == False # before
gc.is_tracked(A()) == True # with the patch This doesn't fix the refleak though. |
I removed all slots from typing.py, the refleak is still there. |
Wow. In default CPython branch, with no modifications, test_typing.py simply crashes in debug mode: yury@ysmbp ~/dev/py/cpython $ ./python.exe -m test -R3:3 test_typing
Run tests sequentially
0:00:00 [1/1] test_typing
beginning 6 repetitions
123456
.test test_typing failed -- Traceback (most recent call last):
File "/Users/yury/dev/py/cpython/Lib/test/test_typing.py", line 739, in test_generic_forvard_ref
self.assertEqual(get_type_hints(foobar, globals(), locals()), {'x': List[List[T]]})
AssertionError: {'x': typing.List[typing.List[~T]]} != {'x': typing.List[typing.List[~T]]}
{'x': typing.List[typing.List[~T]]} test_typing failed 1 test failed: Total duration: 1 sec |
The problem is that functools.lru_cache is used in _tp_cache. If I remove type caching, the original "refleak" is fixed. |
Serhiy: I think you forgot to make a global instance of the type for your demonstration. You mentioned in msg253899 that the loop is: global instance -> type -> method -> module globals -> global instance The example you gave doesn't instantiate class A, and it's the instance of A that doesn't participate in GC and causes the problem, right? |
Josh, Serhiy, I've created an issue for tracking empty slots types: bpo-28651. This issue is about typing.py using functools.lru_cache for internal cache That is what causing the refleaks, and inability of test_typing.py to be run in refleak hunting mode. |
I could use a hint on a complete program that establishes whether there is |
Good sleuthing! So the bug is in _lru_cache? On Wed, Nov 9, 2016 at 10:37 AM, Guido van Rossum <report@bugs.python.org>
|
Sorry, it was bad example. I don't remember all details.
Proposed patch clears caches when search for reference leaks. This decreases the number of leaks. Unpatched: Patched: See also bpo-23839. |
I have no objections against typing-clear-caches.patch (though we'll have to make sure to apply it to the GitHub upstream as well -- I can take care of that once you merge it to CPython). Does that patch fix the refleak completely or only partially? I have no opinion on empty slots. |
I don't think it's a bug of lru_cache. It stores strong references for arguments and return value of the decorated function (which btw isn't documented). I don't think it's possible to write efficient generic lru_cache that would just use weakrefs. The problem is that because _tp_cache helper uses it, the lifetime of some typing classes becomes indefinite. Which shouldn't be a problem for a typical user of typing, since most of the time it is used on the module level. We have two potential solutions to this problem:
[1] is the ideal solution, [2] should work good enough. |
Yury, Guido, I think I understand why test_typing crashes in refleak hunting mode. The test relies on caching, namely type variables are compared by id, so that if cache is cleared between those two lines, then test fails. I think we should just update those test to use a usual class in forward reference. The patch is attached. (Patch fixes the crash in refleak hunting mode, but the refleak is still there) |
I tried to empty the cache in test_typing.BaseTestCase.setUp and tearDown; this doesn't fix all refleaks. Do we have some other caches in typing.py? |
It fixes 96% of refleaks. |
Yury, _tp_cache should be the only one (of course most generic classes have _abc_cache since GenericMeta inherits from ABCMeta) |
FYI I've opened python/typing#323 to track this upstream. |
In typing.py? Or generally? |
Then there is at least one more bug not related to lru_cache (or any kind of cache in typing). And it's not classes-with-empty-slots bug either. |
Serhiy's patch typing-clear-caches.patch looks good, we should apply it (fixing the code style a little bit). |
Oh. This is a bug in C implementation of functools.lru_cache. I'll take a look. |
My impression is like something wrong happens when TypeError is raised by a cached function. |
A very simple repro: with self.assertRaises(TypeError):
Union[[]] It looks like the problem happens when a non-hashable argument is passed to cached function |
reflesk with only functools: def test_lru_type_error(self):
@functools.lru_cache(maxsize=None)
def foo(o):
raise TypeError
with self.assertRaises(TypeError):
foo([]) |
Found it, will open a separate issue. |
See http://bugs.python.org/issue28653 for details. |
So Ivan made an interesting observation: if we use Python version of functools.lru_cache, typing tests start to leak like crazy:
I experimented a bit, and I *think* I know what's happening:
A simple fix that removes fixes the refleak with Python version of lru_cache is to add a 'tearDown' method to 'test_typing.BaseTestCase': class BaseTestCase(TestCase):
def tearDown(self):
for f in typing._cleanups:
f() Now this all of this is just my guess of what's going on here. It might be something that has to be fixed in typing, or maybe we should just add tearDown. Guido, Ivan, what do you think? |
But why is this so different from the C implementation of lru_cache? |
I don't know. Maybe C version of lru cache creates a bit less pressure on GC. BTW, if I set maxsize=100000 for typing lru_cache, test_generic_forward_ref crashes in refleak-test mode. Maybe this is another bug... |
Actually test/refleak.py now cleans up typing's lru cache & calls "gc.collect()". It seems that the typing cycles aren't collectable by the GC when Python lru cache is used. |
I also noticed this some time ago, ideally we need to make all tests more isolated between each other.
adding tearDown is a good intermediate solution, while I could go through tests and try to make them more isolated from each other where necessary.
I also noticed this today, and this is something that I don't understand yet. This test should not depend on caching, but it looks like it does. I will think more about this. |
Segfault or traceback? Anyway, it implements a doubly-linked list where each node is implemented |
Traceback. I should have said failed, not crashed :(
The problem is that the test fails depending on cache size, and caching code shouldn't really do that ever. |
It looks like the tearDown() works really well. It also somehow fixes the problem with test_generic_forward_ref |
I think Yury is right here. The good plan would be to add the tearDown(), but also fix the problem with test_generic_forward_ref. I could work on it tomorrow (really don't have time today, sorry). |
Guido, Yury, it looks like I solved the puzzle. All the remaining problems are because of forward references. In particular, _ForwardRef keeps a reference to the frame where it was defined: typing_globals = globals()
frame = sys._getframe(1)
while frame is not None and frame.f_globals is typing_globals:
frame = frame.f_back
assert frame is not None
self.__forward_frame__ = frame This is old code from 2015 introduced to support __isinstance__ for forward refs. The latter is no more supported anyway, so that I believe this code should be removed. I will make a PR upstream soon. |
This one is also related to the mentioned in my previous message. Namely, forward references are only evaluated once. However, in refleak hunting mode tests are repeated multiple times, and every time test_generic_forward_ref is called, this code: class CC: ... creates a new local class, while forward ref is still pointing to an old one. There are two options:
However I would vote for a compromise solution: if the |
OK, here are the PRs: python/typing#327 Those two seem to fix everything, I tried various cache sizes with both C and Python versions of lru_cache and found no refleaks. |
Good work, Ivan!
Speaking of lru_cache: typing currently doesn't configure maxsize, which means that the cache is limited to 128 items. Is that OK? Should maxsize be set to None to make the cache unlimited, or, maybe, a higher number like 1024 is more appropriate? |
We discussed cache size previously and 128 is fine. --Guido (mobile) |
New changeset 0da2e381ad71 by Guido van Rossum in branch '3.5': New changeset 249a1f0b2857 by Guido van Rossum in branch '3.6': New changeset 304f017462e4 by Guido van Rossum in branch 'default': |
New changeset 555f0ca31587 by Guido van Rossum in branch '3.5': New changeset 0f863906cf2e by Guido van Rossum in branch '3.6': New changeset 84b4ac243508 by Guido van Rossum in branch 'default': |
Thanks a bundle, Ivan! I've merged those two into CPython 3.5, 3.6, 3.7. We can now watch the build bots for a while and double-check the refleaks. Yury, when you're happy with the situation can you close this issue? |
Thank you Guido and Ivan!
I'll close it now, will reopen if something comes up. |
BTW, do we still want the tearDown()? Or is that no longer necessary? |
I think tearDown() is not necessary. But on the other hand it could be nice to have a method in tests to force cache clear. I would propose it not as a default, but as an opt-in to check that cache works well, or that a certain tested feature is "cache independent", etc. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: