msg280407 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 16:00 |
Looks like we have a refleak somewhere in the system that typing.py is exposing.
The following code exposes the refleak:
def test_refleak(self):
T = typing.TypeVar('T')
class C(typing.Generic[T], typing.Mapping[int, str]): ...
The change that first triggered the leak is 8f0df4db2b06.
I'm not an expert in typing.py, but I suspect that the real bug must be somewhere in CPython core.
Guido, do you have any idea how 8f0df4db2b06 could trigger this?
|
msg280411 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-09 16:08 |
Could it be related to leaks in objects with empty __slots__? See issue24379 for details.
|
msg280412 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 16:10 |
> Could it be related to leaks in objects with empty __slots__? See issue24379 for details.
Not sure... Do we have a test/patch for objects-with-empty-slots bug?
|
msg280413 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 16:15 |
Ivan, this was an early typing.py update (merged into CPython on Sept. 27). But the refleak is still with us. Do you have any intuition on what could be going on here?
My own intuition here is lacking, other than thinking there's a lot of metaclass magic going on in this code...
|
msg280414 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-09 16:21 |
>>> import gc
>>> gc.collect()
23
>>> gc.collect()
0
>>> class A:
... __slots__ = ()
...
>>> del A
>>> gc.collect()
2
^ leak
|
msg280415 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-09 16:24 |
See also https://mail.python.org/pipermail/python-dev/2015-October/141993.html.
|
msg280417 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 17:13 |
Serhiy, why is `gc.collect()` returning 2 after `del A` proof of a leak?
Yes, there's a cycle, but it's being GC'ed -- isn't that fine? Without the
slots the collect() call returns a larger number.
|
msg280420 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 17:50 |
Attached patch (slots_gc.patch) makes objects with empty slots tracked by GC:
class A: __slots__ = ()
gc.is_tracked(A()) == False # before
gc.is_tracked(A()) == True # with the patch
This doesn't fix the refleak though.
|
msg280421 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 17:58 |
I removed all slots from typing.py, the refleak is still there.
|
msg280424 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 18:11 |
Wow. In default CPython branch, with no modifications, test_typing.py simply crashes in debug mode:
yury@ysmbp ~/dev/py/cpython $ ./python.exe -m test -R3:3 test_typing
Run tests sequentially
0:00:00 [1/1] test_typing
beginning 6 repetitions
123456
.test test_typing failed -- Traceback (most recent call last):
File "/Users/yury/dev/py/cpython/Lib/test/test_typing.py", line 739, in test_generic_forvard_ref
self.assertEqual(get_type_hints(foobar, globals(), locals()), {'x': List[List[T]]})
AssertionError: {'x': typing.List[typing.List[~T]]} != {'x': typing.List[typing.List[~T]]}
{'x': typing.List[typing.List[~T]]}
test_typing failed
1 test failed:
test_typing
Total duration: 1 sec
Tests result: FAILURE
|
msg280425 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 18:13 |
The problem is that functools.lru_cache is used in _tp_cache. If I remove type caching, the original "refleak" is fixed.
|
msg280426 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2016-11-09 18:29 |
Serhiy: I think you forgot to make a global instance of the type for your demonstration. You mentioned in msg253899 that the loop is:
global instance -> type -> method -> module globals -> global instance
The example you gave doesn't instantiate class A, and it's the instance of A that doesn't participate in GC and causes the problem, right?
|
msg280428 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 18:33 |
Josh, Serhiy,
I've created an issue for tracking empty slots types: #28651.
This issue is about typing.py using functools.lru_cache for internal cache That is what causing the refleaks, and inability of test_typing.py to be run in refleak hunting mode.
|
msg280430 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 18:37 |
I could use a hint on a complete program that establishes whether there is
or is not a refleak.
|
msg280431 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 18:48 |
Good sleuthing! So the bug is in _lru_cache?
On Wed, Nov 9, 2016 at 10:37 AM, Guido van Rossum <report@bugs.python.org>
wrote:
>
> Guido van Rossum added the comment:
>
> I could use a hint on a complete program that establishes whether there is
> or is not a refleak.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue28649>
> _______________________________________
>
|
msg280434 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-09 18:52 |
> Serhiy, why is `gc.collect()` returning 2 after `del A` proof of a leak?
Sorry, it was bad example. I don't remember all details.
> The problem is that functools.lru_cache is used in _tp_cache. If I remove type caching, the original "refleak" is fixed.
Proposed patch clears caches when search for reference leaks. This decreases the number of leaks.
Unpatched:
test_typing leaked [3125, 3089, 2897] references, sum=9111
test_typing leaked [1189, 1179, 1103] memory blocks, sum=3471
Patched:
test_typing leaked [125, 125, 125] references, sum=375
test_typing leaked [49, 51, 51] memory blocks, sum=151
See also issue23839.
|
msg280436 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 19:03 |
I have no objections against typing-clear-caches.patch (though we'll have to make sure to apply it to the GitHub upstream as well -- I can take care of that once you merge it to CPython).
Does that patch fix the refleak completely or only partially?
I have no opinion on empty slots.
|
msg280437 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 19:06 |
> Good sleuthing! So the bug is in _lru_cache?
I don't think it's a bug of lru_cache. It stores strong references for arguments and return value of the decorated function (which btw isn't documented). I don't think it's possible to write efficient generic lru_cache that would just use weakrefs.
The problem is that because _tp_cache helper uses it, the lifetime of some typing classes becomes indefinite. Which shouldn't be a problem for a typical user of typing, since most of the time it is used on the module level.
We have two potential solutions to this problem:
1. Add a weak-ref type cache and refactor _tp_cache to use it;
2. Cleanup lru-cache before/adter running each typing test.
[1] is the ideal solution, [2] should work good enough.
|
msg280438 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 19:06 |
Yury, Guido, I think I understand why test_typing crashes in refleak hunting mode. The test relies on caching, namely type variables are compared by id, so that if cache is cleared between those two lines, then test fails. I think we should just update those test to use a usual class in forward reference. The patch is attached.
(Patch fixes the crash in refleak hunting mode, but the refleak is still there)
|
msg280439 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 19:10 |
> Does that patch fix the refleak completely or only partially?
I tried to empty the cache in test_typing.BaseTestCase.setUp and tearDown; this doesn't fix all refleaks.
Do we have some other caches in typing.py?
|
msg280440 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2016-11-09 19:14 |
> Does that patch fix the refleak completely or only partially?
It fixes 96% of refleaks.
|
msg280441 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 19:15 |
Yury, _tp_cache should be the only one (of course most generic classes have _abc_cache since GenericMeta inherits from ABCMeta)
|
msg280442 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 19:18 |
FYI I've opened https://github.com/python/typing/issues/323 to track this upstream.
|
msg280443 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 19:19 |
> It fixes 96% of refleaks.
In typing.py? Or generally?
|
msg280444 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 19:20 |
> Yury, _tp_cache should be the only one (of course most generic classes have _abc_cache since GenericMeta inherits from ABCMeta)
Then there is at least one more bug not related to lru_cache (or any kind of cache in typing).
And it's not classes-with-empty-slots bug either.
|
msg280446 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 19:28 |
Serhiy's patch typing-clear-caches.patch looks good, we should apply it (fixing the code style a little bit).
|
msg280447 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 19:29 |
Here is the corrected patch to avoid crash.
|
msg280454 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-09 21:19 |
New changeset d790078797bd by Guido van Rossum in branch '3.5':
Issue #28649: fix-typing-test-v2.diff
https://hg.python.org/cpython/rev/d790078797bd
New changeset 43be7891b1f5 by Guido van Rossum in branch '3.6':
Issue #28649: fix-typing-test-v2.diff (3.5->3.6)
https://hg.python.org/cpython/rev/43be7891b1f5
New changeset fd47a9d791b9 by Guido van Rossum in branch 'default':
Issue #28649: fix-typing-test-v2.diff (3.6->3.7)
https://hg.python.org/cpython/rev/fd47a9d791b9
|
msg280455 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-09 21:23 |
New changeset d920bfa5a71a by Guido van Rossum in branch '3.5':
Issue #28649: typing-clear-caches.patch
https://hg.python.org/cpython/rev/d920bfa5a71a
New changeset bd2ec9965f47 by Guido van Rossum in branch '3.6':
Issue #28649: typing-clear-caches.patch (3.5->3.6)
https://hg.python.org/cpython/rev/bd2ec9965f47
New changeset 08f76f89d199 by Guido van Rossum in branch 'default':
Issue #28649: typing-clear-caches.patch (3.6->3.7)
https://hg.python.org/cpython/rev/08f76f89d199
|
msg280457 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-09 21:32 |
Serhiy can you apply the non-typing.py part of typing-clear-caches.patch? The diff I applied only has the typing.py part (because I've done this upstream first).
|
msg280458 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-09 21:53 |
New changeset d926b484d33a by Serhiy Storchaka in branch '3.5':
Issue #28649: Clear the typing module caches when search for reference leaks.
https://hg.python.org/cpython/rev/d926b484d33a
New changeset caf3ceb93307 by Serhiy Storchaka in branch '3.6':
Issue #28649: Clear the typing module caches when search for reference leaks.
https://hg.python.org/cpython/rev/caf3ceb93307
New changeset 437564294e6c by Serhiy Storchaka in branch 'default':
Issue #28649: Clear the typing module caches when search for reference leaks.
https://hg.python.org/cpython/rev/437564294e6c
|
msg280459 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 22:55 |
With all commits pushed, test_typing still seems to leak. Ivan, do you have time to see what's going on?
test_typing leaked [125, 125, 125] references, sum=375
test_typing leaked [49, 49, 49] memory blocks, sum=147
|
msg280460 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 22:59 |
The specific test that leaks is test_extended_generic_rules_eq
|
msg280461 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 23:01 |
Yury, yes I will take a look at this now.
|
msg280462 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 23:01 |
Ivan, this is the part of the test that leaks:
def test_extended_generic_rules_eq(self):
T = TypeVar('T')
U = TypeVar('U')
with self.assertRaises(TypeError):
Callable[[T], U][[], int]
|
msg280463 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 23:03 |
Oh. This is a bug in C implementation of functools.lru_cache. I'll take a look.
|
msg280464 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 23:04 |
My impression is like something wrong happens when TypeError is raised by a cached function.
|
msg280465 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-09 23:23 |
A very simple repro:
with self.assertRaises(TypeError):
Union[[]]
It looks like the problem happens when a non-hashable argument is passed to cached function
|
msg280466 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 23:37 |
reflesk with only functools:
def test_lru_type_error(self):
@functools.lru_cache(maxsize=None)
def foo(o):
raise TypeError
with self.assertRaises(TypeError):
foo([])
|
msg280467 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 23:44 |
Found it, will open a separate issue.
|
msg280469 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-09 23:47 |
See http://bugs.python.org/issue28653 for details.
|
msg280481 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 01:20 |
So Ivan made an interesting observation: if we use Python version of functools.lru_cache, typing tests start to leak like crazy:
beginning 6 repetitions
123456
......
test_typing leaked [24980, 24980, 24980] references, sum=74940
I experimented a bit, and I *think* I know what's happening:
* typing uses an lru cache for types.
* It looks like that some types in the cache are being *reused* by different tests.
* Because typing doesn't set the size of the lru cache, it's set to 128 (default).
* When many typing tests execute, the cache invalidates some entries, but because types in test_typing are so complex, I believe that the resulting graph of objects it too complex for the GC to go through (10s of thousands of cross-linked objects).
A simple fix that removes fixes the refleak with Python version of lru_cache is to add a 'tearDown' method to 'test_typing.BaseTestCase':
class BaseTestCase(TestCase):
def tearDown(self):
for f in typing._cleanups:
f()
Now this all of this is just my guess of what's going on here. It might be something that has to be fixed in typing, or maybe we should just add tearDown.
Guido, Ivan, what do you think?
|
msg280482 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-10 01:24 |
But why is this so different from the C implementation of lru_cache?
|
msg280483 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 01:27 |
> But why is this so different from the C implementation of lru_cache?
I don't know. Maybe C version of lru cache creates a bit less pressure on GC.
BTW, if I set maxsize=100000 for typing lru_cache, test_generic_forward_ref crashes in refleak-test mode. Maybe this is another bug...
|
msg280484 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 01:35 |
>> But why is this so different from the C implementation of lru_cache?
> I don't know. Maybe C version of lru cache creates a bit less pressure on GC.
Actually test/refleak.py now cleans up typing's lru cache & calls "gc.collect()". It seems that the typing cycles aren't collectable by the GC when Python lru cache is used.
|
msg280485 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 01:38 |
> * It looks like that some types in the cache are being *reused* by different tests.
I also noticed this some time ago, ideally we need to make all tests more isolated between each other.
> Now this all of this is just my guess of what's going on here. It might be something that has to be fixed in typing, or maybe we should just add tearDown.
adding tearDown is a good intermediate solution, while I could go through tests and try to make them more isolated from each other where necessary.
> BTW, if I set maxsize=100000 for typing lru_cache, test_generic_forward_ref crashes in refleak-test mode. Maybe this is another bug...
I also noticed this today, and this is something that I don't understand yet. This test should not depend on caching, but it looks like it does. I will think more about this.
|
msg280486 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-10 01:39 |
Segfault or traceback?
Anyway, it implements a doubly-linked list where each node is implemented
as a list of 4 items... Maybe there's something in the GC code that
recurses down at the C level??? I'm not sure a crash the Python version of
lru_cache with maxsize=100000 is high on my list of worries.
|
msg280487 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 01:46 |
> Segfault or traceback?
Traceback. I should have said failed, not crashed :(
> I'm not sure a crash the Python version of
lru_cache with maxsize=100000 is high on my list of worries.
The problem is that the test fails depending on cache size, and caching code shouldn't really do that ever.
|
msg280488 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 01:51 |
It looks like the tearDown() works really well. It also somehow fixes the problem with test_generic_forward_ref
|
msg280489 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 01:56 |
I think Yury is right here. The good plan would be to add the tearDown(), but also fix the problem with test_generic_forward_ref. I could work on it tomorrow (really don't have time today, sorry).
|
msg280493 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 08:13 |
Guido, Yury, it looks like I solved the puzzle. All the remaining problems are because of forward references. In particular, _ForwardRef keeps a reference to the frame where it was defined:
typing_globals = globals()
frame = sys._getframe(1)
while frame is not None and frame.f_globals is typing_globals:
frame = frame.f_back
assert frame is not None
self.__forward_frame__ = frame
This is old code from 2015 introduced to support __isinstance__ for forward refs. The latter is no more supported anyway, so that I believe this code should be removed.
I will make a PR upstream soon.
|
msg280496 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 08:43 |
> BTW, if I set maxsize=100000 for typing lru_cache, test_generic_forward_ref crashes in refleak-test mode. Maybe this is another bug...
This one is also related to the mentioned in my previous message. Namely, forward references are only evaluated once. However, in refleak hunting mode tests are repeated multiple times, and every time test_generic_forward_ref is called, this code:
class CC: ...
creates a new local class, while forward ref is still pointing to an old one.
There are two options:
* keep this behaviour and update the test to use a globally defined class;
* change this and re-evaluate forward refs every times.
However I would vote for a compromise solution: if the ``locals`` argument is not given, just use the evaluated value. But if a user gives non-None ``locals`` argument then that user most probably wants to re-evaluate the forward ref locally.
|
msg280501 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 10:31 |
OK, here are the PRs:
https://github.com/python/typing/pull/327
https://github.com/python/typing/pull/328
Those two seem to fix everything, I tried various cache sizes with both C and Python versions of lru_cache and found no refleaks.
|
msg280513 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 15:59 |
Good work, Ivan!
> Those two seem to fix everything, I tried various cache sizes with both C and Python versions of lru_cache and found no refleaks.
Speaking of lru_cache: typing currently doesn't configure maxsize, which means that the cache is limited to 128 items. Is that OK? Should maxsize be set to None to make the cache unlimited, or, maybe, a higher number like 1024 is more appropriate?
|
msg280514 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-10 16:03 |
We discussed cache size previously and 128 is fine.
--Guido (mobile)
|
msg280518 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-10 16:28 |
New changeset 0da2e381ad71 by Guido van Rossum in branch '3.5':
Issue #28649: fix first issue with _ForwardRef (#327)
https://hg.python.org/cpython/rev/0da2e381ad71
New changeset 249a1f0b2857 by Guido van Rossum in branch '3.6':
Issue #28649: fix first issue with _ForwardRef (#327) (3.5->3.6)
https://hg.python.org/cpython/rev/249a1f0b2857
New changeset 304f017462e4 by Guido van Rossum in branch 'default':
Issue #28649: fix first issue with _ForwardRef (#327) (3.6->3.7)
https://hg.python.org/cpython/rev/304f017462e4
|
msg280519 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-11-10 16:29 |
New changeset 555f0ca31587 by Guido van Rossum in branch '3.5':
Issue #28649: fix second issue with _ForwardRef (#328)
https://hg.python.org/cpython/rev/555f0ca31587
New changeset 0f863906cf2e by Guido van Rossum in branch '3.6':
Issue #28649: fix second issue with _ForwardRef (#328) (3.5->3.6)
https://hg.python.org/cpython/rev/0f863906cf2e
New changeset 84b4ac243508 by Guido van Rossum in branch 'default':
Issue #28649: fix second issue with _ForwardRef (#328) (3.6->3.7)
https://hg.python.org/cpython/rev/84b4ac243508
|
msg280520 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-10 16:31 |
Thanks a bundle, Ivan! I've merged those two into CPython 3.5, 3.6, 3.7. We can now watch the build bots for a while and double-check the refleaks.
Yury, when you're happy with the situation can you close this issue?
|
msg280521 - (view) |
Author: Yury Selivanov (yselivanov) * |
Date: 2016-11-10 16:32 |
Thank you Guido and Ivan!
> Yury, when you're happy with the situation can you close this issue?
I'll close it now, will reopen if something comes up.
|
msg280523 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2016-11-10 16:35 |
BTW, do we still want the tearDown()? Or is that no longer necessary?
|
msg280524 - (view) |
Author: Ivan Levkivskyi (levkivskyi) * |
Date: 2016-11-10 16:45 |
I think tearDown() is not necessary. But on the other hand it could be nice to have a method in tests to force cache clear. I would propose it not as a default, but as an opt-in to check that cache works well, or that a certain tested feature is "cache independent", etc.
|