This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Reference leaks in subinterpreters
Type: resource usage Stage: resolved
Components: Tests Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: vstinner Nosy List: eric.snow, miss-islington, pablogsal, vstinner
Priority: high Keywords: patch

Created on 2019-12-03 19:55 by pablogsal, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 17453 merged pablogsal, 2019-12-03 20:58
PR 17454 merged pablogsal, 2019-12-03 21:22
PR 17456 closed pablogsal, 2019-12-03 21:59
PR 17457 merged pablogsal, 2019-12-03 22:12
Messages (16)
msg357759 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 19:55
All the refleak build bots for master are reporting reference leaks in subinterpreter related tests:

https://buildbot.python.org/all/#/builders/126/builds/6/steps/5/logs/stdio
https://buildbot.python.org/all/#/builders/144/builds/6/steps/5/logs/stdio
https://buildbot.python.org/all/#/builders/206/builds/6
https://buildbot.python.org/all/#/builders/157/builds/6/steps/4/logs/stdio

......
test_atexit leaked [882, 882, 882] references, sum=2646
test_atexit leaked [12, 12, 12] memory blocks, sum=36
5 tests failed again:
    test__xxsubinterpreters test_atexit test_capi test_httpservers
    test_threading
== Tests result: FAILURE then FAILURE ==
401 tests OK.
10 slowest tests:
- test_asyncio: 33 min 34 sec
- test_concurrent_futures: 17 min 22 sec
- test_multiprocessing_spawn: 17 min 6 sec
- test_zipfile: 9 min 25 sec
- test_multiprocessing_forkserver: 9 min 2 sec
- test_multiprocessing_fork: 8 min 43 sec
- test_largefile: 7 min 32 sec
- test_lib2to3: 7 min 3 sec
- test_mailbox: 6 min 27 sec
- test_argparse: 5 min 5 sec
5 tests failed:
    test__xxsubinterpreters test_atexit test_capi test_httpservers
    test_threading
14 tests skipped:
    test_devpoll test_gdb test_ioctl test_kqueue test_msilib
    test_ossaudiodev test_startfile test_tix test_tk test_ttk_guionly
    test_winconsoleio test_winreg test_winsound test_zipfile64
7 re-run tests:
    test__xxsubinterpreters test_atexit test_capi test_httpservers
    test_nntplib test_pty test_threading


Bisecting shows the following commit as the culprit:

ef5aa9af7c7e493402ac62009e4400aed7c3d54e is the first bad commit
commit ef5aa9af7c7e493402ac62009e4400aed7c3d54e
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Nov 20 00:38:03 2019 +0100

    bpo-38858: Reorganize pycore_init_types() (GH-17265)
    
    * Call _PyLong_Init() and _PyExc_Init() earlier
    * new_interpreter() reuses pycore_init_types()

 Python/pylifecycle.c | 31 +++++++++++--------------------
 1 file changed, 11 insertions(+), 20 deletions(-)
bisect run success


Running * test.test_atexit.SubinterpreterTest.test_callbacks_leak is enough for reproducing the problem.
msg357760 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 20:47
Actually, that commit has an independent refleak that was fixed (!) regarding the small integer cache. Seems that the actual commit that made the new leak is:

commit 2582d46fbcf7bdf86b9cf4016850b8d155267ac6
Author: Victor Stinner <vstinner@python.org>
Date:   Fri Nov 22 19:24:49 2019 +0100

    bpo-38858: new_interpreter() reuses pycore_init_builtins() (GH-17351)
    
    new_interpreter() now calls _PyBuiltin_Init() to create the builtins
    module and calls _PyImport_FixupBuiltin(), rather than using
    _PyImport_FindBuiltin(tstate, "builtins").
    
    pycore_init_builtins() is now responsible to initialize
    intepr->builtins_copy: inline _PyImport_Init() and remove this
    function.



Reverting this commit fixes the refleaks in atexit at least
msg357761 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 20:59
Ok, the problem seems to be that _PyBuiltin_Init() returns a new reference that is not cleared once PyModule_GetDict() and _PyBuiltins_AddExceptions() is called.
msg357762 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 21:02
Indeed, that fixes the problem in at least 2 modules:

 ./python -m test test_atexit -R :
0:00:00 load avg: 0.84 Run tests sequentially
0:00:00 load avg: 0.84 [1/1] test_atexit
beginning 9 repetitions
123456789
.........

== Tests result: SUCCESS ==

1 test OK.

Total duration: 1.5 sec
Tests result: SUCCESS


 ./python -m test test_capi -R :  
0:00:00 load avg: 1.19 Run tests sequentially
0:00:00 load avg: 1.19 [1/1] test_capi
beginning 9 repetitions
123456789
..........
test_capi passed in 2 min 8 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2 min 8 sec
Tests result: SUCCESS



Although the rest of the tests still leak, so there is something else :(
msg357763 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 21:22
Opened https://github.com/python/cpython/pull/17454 for the fixing the leak in http_servers
msg357764 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 21:32
For the subinterpreter other leaks it seems that the commit that introduced the leak is:

7247407c35330f3f6292f1d40606b7ba6afd5700 is the first bad commit
commit 7247407c35330f3f6292f1d40606b7ba6afd5700
Author: Victor Stinner <vstinner@python.org>
Date:   Wed Nov 20 12:25:50 2019 +0100

    bpo-36854: Move _PyRuntimeState.gc to PyInterpreterState (GH-17287)
    
    * Rename _PyGC_InitializeRuntime() to _PyGC_InitState()
    * finalize_interp_clear() now also calls _PyGC_Fini() in
      subinterpreters (clear the GC state).

 Include/internal/pycore_object.h                   |  2 +-
 Include/internal/pycore_pymem.h                    |  2 +-
 Include/internal/pycore_pystate.h                  |  7 +--
 .../2019-11-20-12-01-37.bpo-36854.Zga_md.rst       |  3 +
 Modules/gcmodule.c                                 | 72 ++++++++++++----------
 Objects/object.c                                   | 20 +++---
 Python/pylifecycle.c                               |  3 +-
 Python/pystate.c                                   |  2 +-
 8 files changed, 62 insertions(+), 49 deletions(-)
 create mode 100644 Misc/NEWS.d/next/Core and Builtins/2019-11-20-12-01-37.bpo-36854.Zga_md.rst
bisect run success
msg357765 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 21:34
Indeed, reverting commit 7247407c35330f3f6292f1d40606b7ba6afd5700 fixes the issues in test_threading
msg357766 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 21:46
I created https://github.com/python/cpython/pull/17455 to revert commit 7247407c35330f3f6292f1d40606b7ba6afd5700 as it seems that the leak is more complex than the others. 

If a solution is not found in a couple of days we need to merge it to fix the refleak buildbots.
msg357767 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 22:02
This patch (which is wrong) fixes the reference issues:

diff --git a/Python/pylifecycle.c b/Python/pylifecycle.c
index cce4783bc1..c354af18db 100644
--- a/Python/pylifecycle.c
+++ b/Python/pylifecycle.c
@@ -1276,6 +1276,7 @@ finalize_interp_clear(PyThreadState *tstate)
         _PyExc_Fini();
     }
 
+    PyGC_Collect();
     _PyGC_Fini(tstate);
 }


This suggests that the per-interpreter gc does not play well with the refleak checker.
msg357768 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-12-03 22:15
Opened PR 17457 for the refleaks of test__xxsubinterpreters although I am not very convinced that we can call the GC at that point.
msg357784 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-04 09:29
New changeset 24f5cac7254177a4c9956d680c0a9b6dadd85c6f by Victor Stinner (Pablo Galindo) in branch 'master':
bpo-38962: Fix reference leak in test_httpservers (GH-17454)
https://github.com/python/cpython/commit/24f5cac7254177a4c9956d680c0a9b6dadd85c6f
msg357788 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-04 09:56
"""
5 tests failed again:
    test__xxsubinterpreters test_atexit test_capi test_httpservers
    test_threading
"""

I merged Pablo's fix for test_httpservers.

With PR 17457 + PR 17453 (and the test_httpservers fix), the 5 tests don't leak anymore.

I am fine with PR 17457 and PR 17453, I added some comments.

I would prefer to avoid a revert (PR 17455) if possible ;-)

--

Except of PR 17453 which was my fault in my refactoring work, the problem is the work on subinterpreters doesn't introduce regressions but only reveal existing issues.

Example with Python compiled in the debug mode (master branch):

$ ./python -X showrefcount -c pass
[18564 refs, 6496 blocks]

Python "leaks" 18564 references at exit.

My work is to better isolate subinterpreters from the main interpreter, and so the missing clean up code is revealed by reference leak buildbot checks.
msg357793 - (view) Author: miss-islington (miss-islington) Date: 2019-12-04 11:20
New changeset b96c6b0723b889d3a0c1740bce7f579f33d246f2 by Miss Islington (bot) (Pablo Galindo) in branch 'master':
bpo-38962: Fix reference leak in new_interpreter() (GH-17453)
https://github.com/python/cpython/commit/b96c6b0723b889d3a0c1740bce7f579f33d246f2
msg357800 - (view) Author: miss-islington (miss-islington) Date: 2019-12-04 11:51
New changeset ac0e1c2694bc199dbd073312145e3c09bee52cc4 by Miss Islington (bot) (Pablo Galindo) in branch 'master':
bpo-38962: Fix reference leak in the per-subinterpreter gc (GH-17457)
https://github.com/python/cpython/commit/ac0e1c2694bc199dbd073312145e3c09bee52cc4
msg357802 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-12-04 12:12
The 5 tests don't leak anymore, I close thee issue. Thanks Pablo!

$ ./python -m test -R 3:3 -j0 test__xxsubinterpreters test_atexit test_capi test_httpservers test_threading
(...)
Total duration: 1 min 2 sec
Tests result: SUCCESS
msg358354 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2019-12-13 21:17
Thanks for all the work on this!
History
Date User Action Args
2022-04-11 14:59:23adminsetgithub: 83143
2019-12-13 21:17:17eric.snowsetnosy: + eric.snow
messages: + msg358354
2019-12-04 12:12:28vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg357802

stage: patch review -> resolved
2019-12-04 11:51:08miss-islingtonsetmessages: + msg357800
2019-12-04 11:20:08miss-islingtonsetnosy: + miss-islington
messages: + msg357793
2019-12-04 09:56:02vstinnersetmessages: + msg357788
2019-12-04 09:29:17vstinnersetmessages: + msg357784
2019-12-03 22:15:35pablogsalsetmessages: + msg357768
2019-12-03 22:12:57pablogsalsetpull_requests: + pull_request16938
2019-12-03 22:02:04pablogsalsetmessages: + msg357767
2019-12-03 21:59:58pablogsalsetpull_requests: + pull_request16937
2019-12-03 21:46:40pablogsalsetmessages: + msg357766
2019-12-03 21:34:19pablogsalsetmessages: + msg357765
2019-12-03 21:32:23pablogsalsetmessages: + msg357764
2019-12-03 21:22:56pablogsalsetmessages: + msg357763
2019-12-03 21:22:17pablogsalsetstage: needs patch -> patch review
pull_requests: + pull_request16935
2019-12-03 21:02:51pablogsalsetmessages: + msg357762
2019-12-03 20:59:09pablogsalsetmessages: + msg357761
stage: patch review -> needs patch
2019-12-03 20:58:02pablogsalsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request16934
2019-12-03 20:47:39pablogsalsetmessages: + msg357760
2019-12-03 19:55:50pablogsalcreate