This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Order of _io objects finalization can lose data in reference cycles
Type: behavior Stage:
Components: IO Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: iritkatriel, josh.r, pitrou, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2014-12-04 13:50 by pitrou, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
gcio.py pitrou, 2014-12-04 13:50
gcgzipio.py serhiy.storchaka, 2014-12-05 21:38
Messages (6)
msg232138 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-04 13:50
Spun off from issue #17852, which sticks to interpreter shutdown issue. There is a more general issue when e.g. a BufferedWriter can be finalized (tp_finalize) before the TextIOWrapper wrapping it if they belong to a reference chain, losing unflushed data.

Reproducer attached.
msg232161 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2014-12-04 23:27
What is the proposal? Global registration of file objects that should be flushed before cleanup when they participate in a reference cycle? Adding a special "__predel__" method as suggested in the linked bug? Weak backrefs for file objects that allow a child being destroyed to flush/close its parent (recursively) such that the top of the I/O wrapping chain is always flushed/closed first? Something else?

The underlying problem is that people don't use with statements or otherwise explicitly close their objects, and 99.9% of the time, they get away with it in CPython because of the deterministic reference counting for non-cycles and the finalizers. I'm not sure it's worth adding new complexity (to the interpreter or the I/O hierarchy) to address an issue that can be fixed by closing your files properly, which is already recommended practice.
msg232162 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-12-05 00:03
I was thinking about weak backrefs. It's specialized for the io module but it would be fairly reliable, and not too complicated to implement.
msg232179 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-05 07:54
I think that reference loop breaker should be smarter. If we have a loop

    A ⇄ B → C → D

then the order of the finalization of A and B is not defined, but B should be finalized before C and C before D. This should fix unintentional issues with chained io classes because they have no back references.
msg232180 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-05 08:03
Note that the same issue is exist with gzip, bz2, lzma, tarfile and zipfile (or even worse).
msg407149 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-27 15:03
Reproduced on 3.11.
History
Date User Action Args
2022-04-11 14:58:10adminsetgithub: 67185
2021-11-27 15:03:59iritkatrielsetversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.4, Python 3.5
nosy: + iritkatriel

messages: + msg407149

type: behavior
2014-12-05 21:38:48serhiy.storchakasetfiles: + gcgzipio.py
2014-12-05 21:32:58vstinnersetnosy: + vstinner
2014-12-05 08:03:45serhiy.storchakasetmessages: + msg232180
2014-12-05 07:54:57serhiy.storchakasetmessages: + msg232179
2014-12-05 00:03:04pitrousetmessages: + msg232162
2014-12-04 23:27:38josh.rsetnosy: + josh.r
messages: + msg232161
2014-12-04 13:58:29serhiy.storchakasetnosy: + serhiy.storchaka
2014-12-04 13:50:26pitroucreate