classification
Title: The _csv module can't be garbage-collected after _csv.register_dialect is called
Type: Stage: resolved
Components: Extension Modules Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: corona10, erlendaasland, miss-islington, pablogsal, petr.viktorin
Priority: normal Keywords: patch

Created on 2021-05-12 13:34 by petr.viktorin, last changed 2021-05-12 20:31 by corona10. This issue is now closed.

Files
File name Uploaded Description Edit
gc.diff erlendaasland, 2021-05-12 14:51
Pull Requests
URL Status Linked Edit
PR 26068 closed petr.viktorin, 2021-05-12 13:59
PR 26074 merged erlendaasland, 2021-05-12 15:51
PR 26081 merged miss-islington, 2021-05-12 18:19
Messages (16)
msg393508 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-12 13:34
After `_csv.register_dialect` is called, the csv module is alive even after it's removed from sys.modules. It should be garbage-collected.
(It's not that big a deal: unloading _csv isn't something users should do. But it might be hiding a deeper issue.)

The following reproducer (for a debug build of Python) shows an increasing number of refcounts. (Importing `csv` is the easiest way to call _csv._register_dialect with a proper argument):


import sys
import gc

for i in range(10):
    import csv
    del sys.modules['_csv']
    del sys.modules['csv']
    del csv
    gc.collect()

    print(sys.gettotalrefcount())
msg393512 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-12 13:58
Hm, a similar thing apparently happens with urllib.request.
msg393518 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 14:42
Could it be that the _csv heap types are not garbage collected? Ref. bpo-42972.
msg393520 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 14:50
Adding GC to _csv types:
$ cat 
import sys
import gc

for i in range(10):
    import csv
    del sys.modules['_csv']
    del sys.modules['csv']
    del csv
    gc.collect()

    print(sys.gettotalrefcount())
$ ./python.exe bug.py
73164
73164
73166
73166
73166
73166
73166
73166
73166
73166
msg393521 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 14:52
I created a quick-and-dirty patch. I can clean it up and make it into a PR if you want.
msg393522 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-12 15:07
*facepalm* Yes, that's it.

If you have the time now, could you send he PR?
msg393523 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 15:10
Sure, I’ll do it after dinner :)
msg393525 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-12 15:46
The urllib.request one was caused by _hashlib, see GH-26072.
msg393526 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 15:56
I don't understand this. After applying PR-26074, test_csv now leaks memory/refs:

$ ./python.exe -m test -R : test_csv
0:00:00 load avg: 1.18 Run tests sequentially
0:00:00 load avg: 1.18 [1/1] test_csv
beginning 9 repetitions
123456789
.........
test_csv leaked [3928, 3924, 3924, 3924] references, sum=15700
test_csv leaked [1666, 1664, 1664, 1664] memory blocks, sum=6658
test_csv failed

== Tests result: FAILURE ==

1 test failed:
    test_csv

Total duration: 4.9 sec
Tests result: FAILURE


Any ideas?
msg393527 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 15:59
Also, for some reason two first iterations of the reproducer prints 2 less ref counts.
msg393529 - (view) Author: Petr Viktorin (petr.viktorin) * (Python committer) Date: 2021-05-12 16:18
Changes to _csv.Error should not be necessary, there everything is handled by the superclass.
msg393530 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 16:19
> Changes to _csv.Error should not be necessary, there everything is handled by the superclass.

Got it; thanks.
msg393531 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 16:21
Yeah, that helped a lot: test_csv leaked [487, 487, 487, 487] memory blocks, sum=1948

Thanks! :)
msg393543 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-12 18:56
New changeset ba260acb22f3a6de434dc7a159ddb94a6a8c9b7c by Miss Islington (bot) in branch '3.10':
bpo-44116: Add GC support to _csv heap types (GH-26074) (GH-26081)
https://github.com/python/cpython/commit/ba260acb22f3a6de434dc7a159ddb94a6a8c9b7c
msg393545 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-12 19:41
Pablo, as mentioned in bpo-42972, this is an issue with all heap allocated types; it is not _csv specific. I know that work with heap types have been halted by the SC, as you've pointed out a couple of times already, but shouldn't this heap type issue be fixed before 3.10 is released? There's plenty of time in the beta phase to do this.
msg393546 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-12 19:51
> I know that work with heap types have been halted by the SC, as you've pointed out a couple of times already, but shouldn't this heap type issue be fixed before 3.10 is released?

It should absolutely be fixed for all types that have been already converted. The halting is for more types.
History
Date User Action Args
2021-05-12 20:31:24corona10setnosy: + corona10
2021-05-12 19:51:11pablogsalsetmessages: + msg393546
2021-05-12 19:41:47erlendaaslandsetmessages: + msg393545
2021-05-12 19:32:53pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-05-12 18:56:28pablogsalsetnosy: + pablogsal
messages: + msg393543
2021-05-12 18:19:07miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request24720
stage: patch review
2021-05-12 16:21:43erlendaaslandsetmessages: + msg393531
2021-05-12 16:19:47erlendaaslandsetmessages: + msg393530
2021-05-12 16:18:54petr.viktorinsetmessages: + msg393529
stage: patch review -> (no value)
2021-05-12 15:59:51erlendaaslandsetmessages: + msg393527
2021-05-12 15:56:13erlendaaslandsetmessages: + msg393526
2021-05-12 15:51:51erlendaaslandsetstage: patch review
pull_requests: + pull_request24714
2021-05-12 15:46:32petr.viktorinsetmessages: + msg393525
stage: patch review -> (no value)
2021-05-12 15:10:39erlendaaslandsetmessages: + msg393523
2021-05-12 15:07:51petr.viktorinsetmessages: + msg393522
2021-05-12 14:52:29erlendaaslandsetmessages: + msg393521
2021-05-12 14:51:26erlendaaslandsetfiles: + gc.diff
2021-05-12 14:50:55erlendaaslandsetmessages: + msg393520
2021-05-12 14:42:05erlendaaslandsetnosy: + erlendaasland
messages: + msg393518
2021-05-12 13:59:23petr.viktorinsetkeywords: + patch
stage: patch review
pull_requests: + pull_request24709
2021-05-12 13:58:53petr.viktorinsetmessages: + msg393512
2021-05-12 13:34:11petr.viktorincreate