test_functools leaks randomly 1 memory block #80741

vstinner · 2019-04-08T14:22:16Z

BPO	36560
Nosy	@vstinner, @methane, @serhiy-storchaka, @ilevkivskyi, @Fidget-Spinner
PRs	[WIP] bpo-36560: _abc now uses weakref.WeakSet type #12743 bpo-36560: Fix reference leak huting in regrtest #12744 [2.7] bpo-36560: Fix reference leak hunting in regrtest (GH-12744) #12745 bpo-36560: regrtest: don't collect the GC twice #12747 [3.7] bpo-36560: regrtest: don't collect the GC twice (GH-12747) #12749
Files	test_functools.py test_functools2.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-05-26.22:40:21.672>
created_at = <Date 2019-04-08.14:22:16.492>
labels = ['3.8', 'tests']
title = 'test_functools leaks randomly 1 memory block'
updated_at = <Date 2021-08-31.16:05:56.040>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-08-31.16:05:56.040>
actor = 'kj'
assignee = 'none'
closed = True
closed_date = <Date 2021-05-26.22:40:21.672>
closer = 'vstinner'
components = ['Tests']
creation = <Date 2019-04-08.14:22:16.492>
creator = 'vstinner'
dependencies = []
files = ['48246', '48250']
hgrepos = []
issue_num = 36560
keywords = ['patch']
message_count = 21.0
messages = ['339643', '339651', '339662', '339663', '339731', '339732', '339733', '339735', '339738', '339768', '339772', '339777', '339781', '339965', '340386', '353071', '371286', '394485', '400739', '400742', '400754']
nosy_count = 5.0
nosy_names = ['vstinner', 'methane', 'serhiy.storchaka', 'levkivskyi', 'kj']
pr_nums = ['12743', '12744', '12745', '12747', '12749']
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue36560'
versions = ['Python 3.8']

vstinner · 2019-04-08T14:22:16Z

Sometimes, each run of test_functools leaks exactly 1 memory block, even when the whole test is "re-run in verbose mode". Sometimes, it doesn't leak.

https://buildbot.python.org/all/#/builders/80/builds/550

test_functools leaked [1, 1, 1] memory blocks, sum=3
Re-running test 'test_functools' in verbose mode
test_functools leaked [1, 1, 1] memory blocks, sum=3

Maybe the problem comes from

Example on Linux:

$ ./python -m test -F -r -j1 -R 3:3 test_functools  
Using random seed 3891892
Run tests in parallel using 1 child processes
0:00:01 load avg: 2.38 [  1] test_functools passed
beginning 6 repetitions
123456
......
(...)
0:00:06 load avg: 2.27 [  6] test_functools passed
beginning 6 repetitions
123456
......
0:00:07 load avg: 2.27 [  7/1] test_functools failed
beginning 6 repetitions
123456
......
test_functools leaked [1, 2, 1] memory blocks, sum=4
0:00:08 load avg: 2.27 [  8/1] test_functools passed
beginning 6 repetitions
123456
......

== Tests result: FAILURE ==

7 tests OK.

1 test failed:
test_functools

Total duration: 8 sec 333 ms
Tests result: FAILURE

vstinner · 2019-04-08T15:33:24Z

Current bisection progress: I'm able to reproduce a failure with attached test_functools.py (211 lines) using:

./python -m test -F -j5 -R 3:3 test_functools

It takes between 4 and 200 runs to reproduce the failure, I don't understand what triggers the bug.

I failed to identify if the bug is more likely when the system is idle or when the system is busy. It looks purely random.

vstinner · 2019-04-08T17:11:59Z

Some progress.

I'm now able to reproduce the issue with attached test_functools2.py (103 lines, a single test method):

$ cp (...)test_functools2.py Lib/test/
$ ./configure -C --with-pydebug CFLAGS=-O0
$ make
$ ./python -m test -F -j5 -R 3:3 test_functools2
(...)
0:00:02 load avg: 0.95 [ 19/1] test_functools failed
beginning 6 repetitions
123456
......
test_functools leaked [1, 1, 1] memory blocks, sum=3
(...)
Tests result: FAILURE

I disabled the C accelerator _functools.

I'm now trying to disable the C accelerator of abc, but I had a small issue: bpo-36565.

vstinner · 2019-04-08T17:14:55Z

The issue comes maybe from the complex code in Lib/test/libregrtest/refleak.py to handle the ABC cache:

    abcs = {}
    for abc in [getattr(collections.abc, a) for a in collections.abc.__all__]:
        if not isabstract(abc):
            continue
        for obj in abc.__subclasses__() + [abc]:
            abcs[obj] = _get_dump(obj)[0]

...

    # clear type cache
    sys._clear_type_cache()

    # Clear ABC registries, restoring previously saved ABC registries.
    abs_classes = [getattr(collections.abc, a) for a in collections.abc.__all__]
    abs_classes = filter(isabstract, abs_classes)
    for abc in abs_classes:
        for obj in abc.__subclasses__() + [abc]:
            for ref in abcs.get(obj, set()):
                if ref() is not None:
                    obj.register(ref())
            obj._abc_caches_clear()

    clear_caches()

vstinner · 2019-04-09T11:20:13Z

This issue isn't a real memory leak: if I use -R 3:10 instead of -R 3:3, the test doesn't fail anymore.

But the issue is still annoying since it makes Refleaks buildbot workers fail randomly :-/

This issue remembers me the unstable multiprocessing tests:

bpo-33735: test_multiprocessing_spawn leaked [1, 2, 1] memory blocks on AMD64 Windows8.1 Refleaks 3.7
bpo-33984: test_multiprocessing_forkserver leaked [1, 2, 1] memory blocks on x86 Gentoo Refleaks 3.x

Patch to always display memory allocations differences:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index d68ea63b5b..997be819fa 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -118,6 +118,8 @@ def dash_R(the_module, test, indirect_test, huntrleaks):
                 print(msg, file=refrep)
                 refrep.flush()
             failed = True
+    if not failed:
+        print(alloc_deltas[nwarmup:])
     return failed

Truncated output with the patch:

vstinner@apu$ ./python -m test -F -r -j1 -R 3:10 test_functools
Using random seed 4308771
Run tests in parallel using 1 child processes
0:00:04 load avg: 0.91 [ 1] test_functools passed
[0, 1, 2, 0, 0, 0, 0, 0, 0, 0]
...
0:00:13 load avg: 0.92 [ 3] test_functools passed
[2, 1, 0, 0, 0, 0, 0, 0, 0, 0]
...
0:00:17 load avg: 0.93 [ 4] test_functools passed
[0, 3, 0, 0, 0, 0, 0, 0, 0, 0]
...
0:00:21 load avg: 0.93 [ 5] test_functools passed
[0, 1, 0, 0, 2, 0, 0, 0, 0, 0]
...
0:00:26 load avg: 0.93 [ 6] test_functools passed
[0, 4, 0, 0, 0, 0, 0, 0, 0, 0]
...
0:00:34 load avg: 0.87 [ 8] test_functools passed
[0, 1, 0, 2, 0, 0, 0, 0, 0, 0]
...
0:01:06 load avg: 1.15 [ 15] test_functools passed
[0, 1, 0, 2, 0, -1, 1, 0, 0, 0]
...
0:01:10 load avg: 1.46 [ 16] test_functools passed
[0, 4, 0, 0, 0, 0, 0, 0, -1, 1]
...

The maximum sum() of these list is around 5 on 10 runs: not every run leaks a memory block. It looks more like a internal cache which is "unstable" if you look at the number of allocated memory blocks.

vstinner · 2019-04-09T11:22:54Z

I wrote PR 12743 to modify _abc: use weakref.WeakSet rather than using directly a set of weak references. With this PR, calling _get_dump() should clear all broken weak references. I'm not sure if it's really useful for this specific issue, since dash_R_cleanup() calls obj._abc_caches_clear() on all abstract base classes (ABC): all caches must be removed before reading the total number of allocated memory blocks (sys.getallocatedblocks()).

serhiy-storchaka · 2019-04-09T11:28:49Z

_abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc?

vstinner · 2019-04-09T12:05:09Z

If I modify libregrtest with the following patch:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index 0bb8a0a2bf..f0225a9768 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -128,7 +128,7 @@ def dash_R(ns, the_module, test_name, test_func):
     failed = False
     for deltas, item_name, checker in [
         (rc_deltas, 'references', check_rc_deltas),
-        (alloc_deltas, 'memory blocks', check_rc_deltas),
+        (alloc_deltas, 'memory blocks', check_fd_deltas),
         (fd_deltas, 'file descriptors', check_fd_deltas)
     ]:
         # ignore warmup runs

And I add the following file Lib/test/test_noop.py:

import unittest

class NoopTests(unittest.TestCase):
    def test_noop(self):
        pass

regrtest detects a "leak":

$ ./python -m test -R 3:3 test_noop
Run tests sequentially
0:00:00 load avg: 0.55 [1/1] test_noop
beginning 6 repetitions
123456
......
test_noop leaked [0, 1, 0] memory blocks, sum=1
test_noop failed

== Tests result: FAILURE ==

1 test failed:
test_noop

Total duration: 113 ms
Tests result: FAILURE

The issue comes from this look in Lib/test/libregrtest/refleak.py:

    for i in range(repcount):
        indirect_test()
        alloc_after, rc_after, fd_after = dash_R_cleanup(fs, ps, pic, zdc,
                                                         abcs)
        print('.', end='', file=sys.stderr, flush=True)
        if i >= nwarmup:
            rc_deltas[i] = get_pooled_int(rc_after - rc_before)
            alloc_deltas[i] = get_pooled_int(alloc_after - alloc_before)
            fd_deltas[i] = get_pooled_int(fd_after - fd_before)
        alloc_before = alloc_after
        rc_before = rc_after
        fd_before = fd_after

Because of "if i >= nwarmup:", get_pooled_int() isn't call during "warmup", whereas the purpose of the warmup is to warmup *everything*.

Maybe get_pooled_int() allocates one frame object and keeps it alive in its "zombi frame". Maybe something else is allocated and kept alive.

Anything, removing "if i >= nwarmup:" to always compute deltas fix this specific issue.

Attached PR 12744 fix this bug.

vstinner · 2019-04-09T12:23:51Z

New changeset 5aaac94 by Victor Stinner in branch 'master':
bpo-36560: Fix reference leak hunting in regrtest (GH-12744)
5aaac94

vstinner · 2019-04-09T16:01:19Z

New changeset 9c14061 by Victor Stinner in branch '2.7':
bpo-36560: Fix reference leak hunting in regrtest (GH-12744) (GH-12745)
9c14061

vstinner · 2019-04-09T16:26:22Z

New changeset bb44478 by Victor Stinner in branch 'master':
bpo-36560: regrtest: don't collect the GC twice (GH-12747)
bb44478

vstinner · 2019-04-09T16:43:07Z

_abc was written for optimization. Your PR makes it using slower Python code. Could you please test how much this hits the performance of abc?

I mostly wrote PR 12743 to debug this issue. Please see the discussion directly on the PR ;-)

vstinner · 2019-04-09T16:56:08Z

New changeset 86f0354 by Victor Stinner in branch '3.7':
[3.7] bpo-36560: regrtest: don't collect the GC twice (GH-12747) (GH-12749)
86f0354

vstinner · 2019-04-11T11:21:06Z

Patch making check on memory block leaks stricter:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index 235d6bfd3a..dfadabdef6 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -130,7 +130,7 @@ def dash_R(ns, the_module, test_name, test_func):
     failed = False
     for deltas, item_name, checker in [
         (rc_deltas, 'references', check_rc_deltas),
-        (alloc_deltas, 'memory blocks', check_rc_deltas),
+        (alloc_deltas, 'memory blocks', check_fd_deltas),
         (fd_deltas, 'file descriptors', check_fd_deltas)
     ]:
         # ignore warmup runs

Using this patch, at least the following tests fail:

test_asyncio
test_code
test_collections
test_contextlib
test_contextlib_async
test_ctypes
test_functools
test_multiprocessing_forkserver
test_multiprocessing_spawn
test_regrtest
test_statistics
test_typing
test_xml_etree_c

I didn't analyze why yet. I guess that they are not real memory leaks, but more minor issue in the code checking for memory leaks. Sadly, it seems like such small glitch can cause a whole Refleak buildbot worker to fail :-(

vstinner · 2019-04-17T10:16:40Z

Another example:

AMD64 Fedora Rawhide Refleaks 3.6
https://buildbot.python.org/all/#/builders/193/builds/5
test_contextlib leaked [8, 1, 1] memory blocks, sum=10

I cannot reproduce the issue on my ("idle") laptop. It sounds like a random failure (race condition).

vstinner · 2019-09-24T10:45:21Z

Recent fail on AMD64 Fedora Rawhide Refleaks 3.x:
https://buildbot.python.org/all/#builders/189/builds/162
test_functools leaked [1, 2, 1] memory blocks, sum=4

vstinner · 2020-06-11T14:23:38Z

AMD64 Fedora Rawhide Refleaks 3.8:
https://buildbot.python.org/all/#/builders/79/builds/196
test_functools leaked [1, 1, 1] memory blocks, sum=3

vstinner · 2021-05-26T22:40:22Z

I didn't see this failure recently. I close the issue.

vstinner · 2021-08-31T14:53:04Z

Recent failure on AMD64 Fedora Rawhide Refleaks 3.x:
https://buildbot.python.org/all/#/builders/565/builds/131

test_typing leaked [1, 1, 1] memory blocks, sum=3
(...)
0:46:40 load avg: 0.00 Re-running test_typing in verbose mode
test_typing leaked [1, 1, 1] memory blocks, sum=3

I don't reopen the issue since I failed to reproduce it.

vstinner · 2021-08-31T15:05:41Z

I can see the test_typing issue with this patch:

diff --git a/Lib/test/libregrtest/refleak.py b/Lib/test/libregrtest/refleak.py
index b94826a5da..49e5f03414 100644
--- a/Lib/test/libregrtest/refleak.py
+++ b/Lib/test/libregrtest/refleak.py
@@ -124,7 +124,7 @@ def check_rc_deltas(deltas):
         #
         #   [5, 5, 6]
         #   [10, 1, 1]
-        return all(delta >= 1 for delta in deltas)
+        return any(delta >= 1 for delta in deltas)
 
     def check_fd_deltas(deltas):
         return any(deltas)

Examples:

$ ./python -m test test_typing -R 3:20
(...)
beginning 23 repetitions
12345678901234567890123
.......................
test_typing leaked [1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=3
test_typing failed (reference leak)


$ ./python -m test test_typing -R 1:20
WARNING: Running tests with --huntrleaks/-R and less than 3 warmup repetitions can give false positives!
(...)
beginning 21 repetitions
123456789012345678901
.....................
test_typing leaked [6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] references, sum=6
test_typing leaked [4, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] memory blocks, sum=7
test_typing failed (reference leak)
(...)

Fidget-Spinner · 2021-08-31T16:05:56Z

@victor, with your patch applied, for test_typing (locally, Windows x64):

$ ./python -m test test_typing -R 3:20
Different result on every run (race condition maybe?)

$ ./python -m test test_typing -R 1:20
Same result as yours on every run.

vstinner added 3.8 only security fixes tests Tests in the Lib/test dir labels Apr 8, 2019

vstinner closed this as completed May 26, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner mentioned this issue Oct 5, 2023

test_import leaked [1, 1, 6] memory blocks on s390x Fedora Refleaks 3.x #110096

Closed

vstinner mentioned this issue Feb 21, 2024

gh-115720: Show number of leaks in huntrleaks progress reports #115726

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_functools leaks randomly 1 memory block #80741

test_functools leaks randomly 1 memory block #80741

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

serhiy-storchaka commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 11, 2019

vstinner commented Apr 17, 2019

vstinner commented Sep 24, 2019

vstinner commented Jun 11, 2020

vstinner commented May 26, 2021

vstinner commented Aug 31, 2021

vstinner commented Aug 31, 2021

Fidget-Spinner commented Aug 31, 2021

test_functools leaks randomly 1 memory block #80741

test_functools leaks randomly 1 memory block #80741

Comments

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 8, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

serhiy-storchaka commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 9, 2019

vstinner commented Apr 11, 2019

vstinner commented Apr 17, 2019

vstinner commented Sep 24, 2019

vstinner commented Jun 11, 2020

vstinner commented May 26, 2021

vstinner commented Aug 31, 2021

vstinner commented Aug 31, 2021

Fidget-Spinner commented Aug 31, 2021