Built-in module _io can lose data from buffered files in reference cycles #62052

arigo · 2013-04-27T07:33:31Z

BPO	17852
Nosy	@tim-one, @nascheme, @pitrou, @vstinner, @njsmith, @methane, @vadmium, @serhiy-storchaka, @jstasiak, @xgid, @nitishch, @Arusekk, @miss-islington, @Volker-Weissmann
PRs	bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. #1908 Revert "bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit." #3337 bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. #3372 bpo-17852: Maintain a list of BufferedWriter objects. Flush them on exit. #3372 bpo-17852: Revert incorrect fix based on misunderstanding of _Py_PyAtExit() semantics #4826 bpo-17852: Fix PR #3372, flush BufferedWriter objects on exit. #4847 bpo-17852: Fixed the documentation about closing files #23135 [3.9] bpo-17852: Doc: Fix the tutorial about closing files (GH-23135) #23527 [3.8] bpo-17852: Doc: Fix the tutorial about closing files (GH-23135) #23528
Files	y.py gcio.py pyio.diff buffer_crash.py buffer_not_flushed.py buffer_register_flush.txt

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/nascheme'
closed_at = None
created_at = <Date 2013-04-27.07:33:30.917>
labels = ['type-bug', '3.7', 'expert-IO']
title = 'Built-in module _io can lose data from buffered files in reference cycles'
updated_at = <Date 2020-12-01.10:53:50.492>
user = 'https://github.com/arigo'

bugs.python.org fields:

activity = <Date 2020-12-01.10:53:50.492>
actor = 'miss-islington'
assignee = 'nascheme'
closed = False
closed_date = None
closer = None
components = ['IO']
creation = <Date 2013-04-27.07:33:30.917>
creator = 'arigo'
dependencies = []
files = ['30030', '37357', '37359', '47121', '47332', '47340']
hgrepos = []
issue_num = 17852
keywords = ['patch']
message_count = 71.0
messages = ['187890', '187898', '187903', '187922', '187966', '188075', '188077', '188095', '188130', '188137', '231376', '231386', '231391', '231420', '232103', '232109', '232113', '232114', '232116', '232117', '232118', '232120', '232121', '232122', '232123', '232124', '232126', '232127', '232129', '232131', '232132', '232133', '232134', '232137', '232139', '232140', '232146', '294868', '294910', '294956', '301295', '301297', '301329', '301331', '301353', '301361', '301363', '302758', '305653', '305665', '308170', '308177', '308236', '308330', '308332', '308412', '308417', '308419', '308424', '308666', '308668', '308681', '308787', '310602', '310648', '310659', '355064', '381924', '381925', '381926', '382232']
nosy_count = 14.0
nosy_names = ['tim.peters', 'nascheme', 'pitrou', 'vstinner', 'njs', 'methane', 'martin.panter', 'serhiy.storchaka', 'jstasiak', 'xgdomingo', 'nitishch', 'Arusekk', 'miss-islington', 'Volker Wei\xc3\x9fmann']
pr_nums = ['1908', '3337', '3372', '3372', '4826', '4847', '23135', '23527', '23528']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue17852'
versions = ['Python 3.6', 'Python 3.7']

Linked PRs

GH-62052: add defer_close logic to FileIO #105171

arigo · 2013-04-27T07:33:31Z

In Python 2, a buffered file opened for writing is flushed by the C library when the process exit. In Python 3, the _pyio and _io modules don't do it reliably. They rely on __del__ being called, which is not neccesarily the case. The attached example ends with the file 'foo' empty (tested with Python 3.3 and the current trunk). The same example in Python 2 using the built-in open() cannot end with 'foo' empty.

pitrou · 2013-04-27T11:40:28Z

There are actually several issues here:

collection of globals at shutdown is wonky: you should add an explicit "del a,f; gc.collect()" at the end of the script
order of tp_clear calls, or another issue with TextIOWrapper: if you open the file in binary mode ("f = open('...', 'wb'); f.write(b'bar')"), the data gets flushed during the GC run

neologix · 2013-04-27T12:41:35Z

Code relying on garbage collection/shutdown hook to flush files is borked:

it's never been guaranteed by the Python specification (neither does Java/C#...)
even with an implementation based on C stdio streams, streams won't get flushed in case of _exit()/abort()/asynchronous signal

Furthermore, I think it's inherently unsafe: in a multi-threaded program, flushing requires locking, and doing this at shutdown is a recipe for deadlock/arbitrary delay (image you have a daemon thread currently holding the lock, e.g. for stdout).

In short, that's an application bug to me.

arigo · 2013-04-27T18:46:38Z

It used to be a consistently reliable behavior in Python 2 (and we made it so in PyPy too), provided of course that the process exits normally; but it no longer is in Python 3. Well I can see the reasons for not flushing files, if it's clearly documented somewhere as a change of behavior from Python 2.

However I'm complaining about the current behavior: files are flushed *most of the time*. That's a behavior that is clearly misleading, or so I would think. I'm rather sure that there are many small scripts and large programs out there relying on automatic flushing, and then one day they'll hit a case where the file is not flushed and get the worst kind of error: a file unexpectedly truncated at 99% of its length, in a way that cannot be reproduced by small examples.

Feel free to close anyway as not-a-bug; I won't fight the Python 3 behavior, because Python 2 works as expected.

neologix · 2013-04-28T07:33:07Z

It used to be a consistently reliable behavior in Python 2 (and we made it so in PyPy too), provided of course that the process exits normally; but it no longer is in Python 3. Well I can see the reasons for not flushing files, if it's clearly documented somewhere as a change of behavior from Python 2.

When you say Python 2, I assume you mean CPython 2, right?
Because - AFAICT - files got flushed only by accident, not by design.
For example, I suspect that Jython doesn't flush files on exit (since
the JVM doesn't), and I guess IronPython neither.

However I'm complaining about the current behavior: files are flushed *most of the time*.

That's the problem with implementation-defined behavior ;-)

vstinner · 2013-04-29T16:46:19Z

"In Python 2, a buffered file opened for writing is flushed by the C library when the process exit."

"When you say Python 2, I assume you mean CPython 2, right?
Because - AFAICT - files got flushed only by accident, not by design."

It looks to be a feature of the standard C library, at least the GNU libc. Its libio library installs an exit handler flushing all open files. You can see it if you set a breaking on write() using gdb:

Breakpoint 5, write () at ../sysdeps/unix/syscall-template.S:82
82 ../sysdeps/unix/syscall-template.S: Aucun fichier ou dossier de ce type.

(gdb) where
#0 write () at ../sysdeps/unix/syscall-template.S:82
#1 0xb7e33ba5 in _IO_new_file_write (f=0x8254ef0, data=0x81f9798, n=3) at fileops.c:1289
#2 0xb7e33a84 in new_do_write (fp=0x8254ef0, data=0x81f9798 "bar\267\300\207\366\267\220\227\037\b\220\227\037\b\004d", to_do=3) at fileops.c:543
#3 0xb7e350fe in _IO_new_do_write (fp=0x8254ef0, data=0x81f9798 "bar\267\300\207\366\267\220\227\037\b\220\227\037\b\004d", to_do=3) at fileops.c:516
#4 0xb7e354b5 in _IO_new_file_overflow (f=0x8254ef0, ch=-1) at fileops.c:894
#5 0xb7e36c40 in _IO_flush_all_lockp (do_lock=0) at genops.c:849
#6 0xb7e36d8e in _IO_cleanup () at genops.c:1010
#7 0xb7df5f92 in __run_exit_handlers (status=0, listp=0xb7f683e4, run_list_atexit=true) at exit.c:91
#8 0xb7df5fdd in __GI_exit (status=0) at exit.c:100
#9 0xb7ddc4db in __libc_start_main (main=0x8058f90 <main>, argc=2, ubp_av=0xbffff1b4, init=0x8156960 <__libc_csu_init>, fini=0x81569d0 <__libc_csu_fini>, rtld_fini=0xb7fed280 <_dl_fini>, stack_end=0xbffff1ac)
at libc-start.c:258
#10 0x08058fd1 in _start ()

Source of _IO_flush_all_lockp() in the GNU libc:

http://sourceware.org/git/?p=glibc.git;a=blob;f=libio/genops.c;h=390d8d24b5fb04f6a35b8fec27e700b9a9d641d4;hb=HEAD#l816

So the glibc maintains a list of open "FILE*" files: _IO_list_all, which is protected by list_all_lock lock.

neologix · 2013-04-29T17:13:53Z

"When you say Python 2, I assume you mean CPython 2, right?
Because - AFAICT - files got flushed only by accident, not by design."

It looks to be a feature of the standard C library, at least the GNU libc. Its libio library installs an exit handler flushing all open files. You can see it if you set a breaking on write() using gdb:

Yes, it's guaranteed by POSIX/ANSI (see man exit).
I was refering to the fact that the automatic flushing of files upon
exit is a mere side effect of the implementation based atop stdio
stream in cpython 2. It's no guaranteed by any Python spec (and I
can't really think of any platform other than C that makes such
guarantee).

vstinner · 2013-04-29T20:22:26Z

> It looks to be a feature of the standard C library, at least the GNU libc.
Yes, it's guaranteed by POSIX/ANSI (see man exit).

Hum, POSIX (2004) is not so strict:
"Whether open streams are flushed or closed, or temporary files are
removed is implementation-defined."
http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

2013/4/29 Charles-François Natali <report@bugs.python.org>:

Charles-François Natali added the comment:

> "When you say Python 2, I assume you mean CPython 2, right?
> Because - AFAICT - files got flushed only by accident, not by design."
>
> It looks to be a feature of the standard C library, at least the GNU libc. Its libio library installs an exit handler flushing all open files. You can see it if you set a breaking on write() using gdb:

Yes, it's guaranteed by POSIX/ANSI (see man exit).
I was refering to the fact that the automatic flushing of files upon
exit is a mere side effect of the implementation based atop stdio
stream in cpython 2. It's no guaranteed by any Python spec (and I
can't really think of any platform other than C that makes such
guarantee).

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue17852\>

neologix · 2013-04-30T06:44:36Z

Hum, POSIX (2004) is not so strict:
"Whether open streams are flushed or closed, or temporary files are
removed is implementation-defined."
http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

You're looking at the wrong section:

"The exit() function shall then flush all open streams with unwritten
buffered data, close all open streams"

Then a couple lines below:

"The _Exit() [CX] and _exit() functions shall not call functions
registered with atexit() nor any registered signal handlers. Whether
open streams are flushed or closed, or temporary files are removed is
implementation-defined."

It's guaranteed for exit(), not _Exit()/_exit().

vstinner · 2013-04-30T07:54:49Z

It's guaranteed for exit(), not _Exit()/_exit().

Oops ok, thanks.

2013/4/30 Charles-François Natali <report@bugs.python.org>:

Charles-François Natali added the comment:

> Hum, POSIX (2004) is not so strict:
> "Whether open streams are flushed or closed, or temporary files are
> removed is implementation-defined."
> http://pubs.opengroup.org/onlinepubs/009695399/functions/exit.html

You're looking at the wrong section:

"The exit() function shall then flush all open streams with unwritten
buffered data, close all open streams"

Then a couple lines below:

"The _Exit() [CX] and _exit() functions shall not call functions
registered with atexit() nor any registered signal handlers. Whether
open streams are flushed or closed, or temporary files are removed is
implementation-defined."

It's guaranteed for exit(), not _Exit()/_exit().

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue17852\>

serhiy-storchaka · 2014-11-19T12:10:27Z

In 3.4+ the example script always writes string "bar" to file "foo". Tested by running it in a loop hundreds times. Cleaning up at shutdown was enhanced in 3.4.

vstinner · 2014-11-19T15:36:19Z

Is anyone interested to work on the "maintain a list of open file objects" idea?

I consider that Python 3 does its best to flush data at exit, but it's not a good practice to rely on the destructors to flush data. I mean, there is no warranty that files will be written in the correct order and that files will stay consistent. It's much safer to explicitly call the clode() method.

By the way, Python 3 now emits ResourceWarning warnings when files are destroyed without being explicitly closed. I enhanced the code handling warnings during Python shutdown to show these warnings in more use cases, but it is still not perfect. I opened the issue bpo-21788 as a reminder that the code can still be enhanced.

For the "maintain a list of objects" idea, I expect race conditions with threads, signals and processes (fork). We already have known issues with locks + fork. I expect even more issues if we maintain a list of open files.

I propose to close the issue as "wontfix". Python 3 does its best, but please call explicitly the close() method!

Maybe the undefined behaviour should be documented (with a warning?) in the buffered writer of the io module, and maybe also in the open() function?

serhiy-storchaka · 2014-11-19T16:38:00Z

I'm +1 on closing this. Agree with Charles-François that it's never been guaranteed by the Python specification. Pythonic way to work with files is to use the "with" statement, or, if you need long living file stream, careful close files in the "finally" block or in __exit__ method of some class which is used as context manager. Non-observance of this rule can be considered as a bug (see bpo-22831).

arigo · 2014-11-20T10:26:19Z

Victor: there is the GIL, you don't need any locking.

nikratio · 2014-12-04T05:09:52Z

Serhiy, I believe this still happens in Python 3.4, but it is harder to reproduce. I couldn't get Armin's script to produce the problem either, but I'm pretty sure that this is what causes e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771452#60.

neologix · 2014-12-04T07:27:05Z

Serhiy, I believe this still happens in Python 3.4, but it is harder to reproduce. I couldn't get Armin's script to produce the problem either, but I'm pretty sure that this is what causes e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771452#60.

Maybe, as it's been said several times, it's an application bug: files
should be closed explicitly (or better with a context manager of
course).

arigo · 2014-12-04T09:57:57Z

I hate to repeat myself, but if the C standard says "files are flushed at exit"; if Python 2 follows this standard; and if Python 3 follows it most of the time but *not always*... then it seems to me that something is very, very buggy in the worst possible way (rare data loss).

serhiy-storchaka · 2014-12-04T10:14:03Z

Do you have working in 3.4+ example?

arigo · 2014-12-04T10:24:42Z

If I understood correctly, Python 3.4 tries harder to find cycles and call destructors at the end of the program, but that's not a full guarantee. For example you can have a reference from a random C extension module.

While trying to come up with an example, I found one that I don't fully understand, but the point is that it shows how easy it is to defeats it:

    import sys
    f = open('foo.txt', 'w')
    f.write('abc')
    def func(*args):
        return func
    sys.settrace(func)

arigo · 2014-12-04T10:26:53Z

(Ah, it's probably a reference from the trace function -> func_globals -> f).

pitrou · 2014-12-04T10:27:46Z

Here is an example.

pitrou · 2014-12-04T10:31:00Z

Note the example I posted doesn't involve the shutdown sequence. It calls gc.collect() explicitly.

arigo · 2014-12-04T10:39:38Z

To add to the confusion: Antoine's example produces an empty file on the current trunk cd282dd0cfe8. When I first tried it on a slightly older trunk (157 changesets ago), it correctly emitted a file with "barXXX ", but only if "gc.collect()" was present. Without "gc.collect()" it didn't. This seems at odds with the explanation that "gc.collect()" should be what occurs at shut-down now.

pitrou · 2014-12-04T10:46:20Z

The problem is that the order of tp_finalize calls is arbitrary when there is a reference cycle (same thing, of course, with tp_clear). So depending on the exact layout of the garbage list, the TextIOWrapper could be collected before the BufferedWriter and the FileIO (good), or after (bad).

I don't see an easy way to solve this. Either _io should provide hints to the GC (which kind of hints?), or the tp_finalize should be customized to somehow call a dependent wrapper's tp_finalize (how? by keeping a weakref?).

arigo · 2014-12-04T10:50:25Z

Maybe accepting the fact that relying on finalizers is a bad idea here?

pitrou · 2014-12-04T10:51:28Z

You mean encourage people to explicitly use "with" or call close()? Yes, that's the simplest answer. It's why we added ResourceWarning.

vstinner · 2014-12-04T11:05:05Z

The problem is that the order of tp_finalize calls is arbitrary when there is a reference cycle

If we want to guaranty that all files are properly flushed at exit, the best option is to maintain a list of open files.

I'm not interested to implement that, it sounds too complex. For example, open("text.txt", "w") creates 3 objects: FileIO, BufferedWriter, TextIOWrapper. You have to register these 3 objects and flush them in the right order. Python is dynamic, you are free to modify objects, for example using the detach() method or by modifying the buffer attribute of the TextIOWrapper.

How do you decide which object should be flushed? In which order? We have to take care of signals, threads and forks, stay portable, etc.

pitrou · 2014-12-04T11:07:00Z

Le 04/12/2014 12:05, STINNER Victor a écrit :

> The problem is that the order of tp_finalize calls is arbitrary
> when
there is a reference cycle

If we want to guaranty that all files are properly flushed at exit,
the best option is to maintain a list of open files.

I am not talking about files flushing at exit.

vstinner · 2017-12-13T00:39:30Z

New changeset 317def9 by Victor Stinner (Antoine Pitrou) in branch 'master':
bpo-17852: Revert incorrect fix based on misunderstanding of _Py_PyAtExit() semantics (bpo-4826)
317def9

nascheme · 2017-12-13T19:30:25Z

I created a new PR which uses the atexit module instead of using _Py_PyAtExit. I think registering in io.py is okay.

I see that atexit is now implemented in C. Rather than registering in io.py, we could create a C API to register callbacks (i.e. atexit_register). That would perhaps be a bit cleaner.

nascheme · 2017-12-14T19:27:01Z

Attached is a script that triggers the non-flushing behaviour for me. I don't think it is reliable since it depends on the order that FileIO AND BufferedWriter are finalized when the gc finds them in a reference cycle.

BTW, it is arguable that the root cause of this bug is that we no longer to topological ordering when calling finalizers. Originally, the cycle GC would refuse to call __del__ methods because once they are part of a cycle, there is no defined topological ordering. So, we linked that garage to gc.garbage rather than calling __del__ and have potentially undefined results. Now the GC has been changed to call __del__ anyhow. I haven't studied how this has been changed but this non-flushing bug is a result. Having the buffer added to gc.garbage would also result in the data not being flushed but arguably it would be more understandable what's going on. I'm not arguing that we should go back to that, just that current behaviour can be subtle and confusing.

pitrou · 2017-12-14T19:58:25Z

The reason Python 2 did well here is simply that Python 2 had a single Python object (the file object) for each actual file. Python 3 has several of them (the raw IO object, the buffered IO object, possibly the text IO wrapper), and so suddenly the finalization order matters.

nascheme · 2017-12-15T17:22:12Z

In the process of trying to write a test for this, I now realize that PR 4847 is not really a fix. If the underlying file gets closed during an earlier gc.collect() and not during shutdown, the extra flushing step is not going to help. So, using atexit is not really a fix. Maybe there is something else we can do, need to think about it more.

serhiy-storchaka · 2017-12-15T18:16:07Z

I'm not well known with GC, but if it be possible to call destructors in the reversed chronological order, this could help with this issue.

nascheme · 2017-12-15T18:36:24Z

Using reversed chronological order would work in 99% of the cases probably but then you would have that rare case where it didn't work. So, I'm not too keen on that approach.

I think this is a classic problem with finalization and GC, probably should do some actual reading of literature. One idea that occurs to me is that the underlying file object could keep a weakref to the higher level (e.g. buffer) objects. Then, if the underlying file gets closes, it could make sure to call flush on the higher level objects first. The weakref would ensure that a ref cycle is not created.

nikratio · 2017-12-15T20:16:28Z

Given the apparent difficulties getting this right, how about not attempting to implicitly flush buffers in the finalizer at all? This means scripts relying on this will break, but in contrast to the current behavior they will break consistently so it's easy to find and fix the problem.

nascheme · 2017-12-19T18:42:24Z

Welp, another day another attempt. As mentioned in the PR 4847, atexit is not the answer. If the raw/buffered file pair are part of a reference cycle and the GC cleans it before atexit runs, then the buffered data can get lost.

I attempted to implement my weakref idea (i.e. raw file keeps a weakref to the buffered file, calls flush before the raw file gets closed). That doesn't work either because the GC clears the weakref before calling __del__.

The attached patch "buffer_register_flush.txt" does seem to work. The downside is that it creates a reference cycle between the raw and buffered file objects. Perhaps that is not a problem since unless you call close() on the raw file, you will be leaking resources anyhow. In the patch, calling close() removes the reference cycle.

I still feel like this is worth fixing, as ugly as the implementation is.

pitrou · 2017-12-19T18:51:12Z

I think it would be quite disruptive to create a reference cycle each time open() is called. It may also break user scripts.

nascheme · 2017-12-19T20:42:53Z

Yeah, I think you are correct. Currently files not part of reference cycles get properly flushed and closed by the reference counter. Implementing my "buffer_register_flush" patch would cause files to be closed only by the cyclic garbage collector (if not explicitly closed). That would mean a script that opens a large number of files could run out of file descriptors. Basically, we lose one of the main advantages of reference counting.

Probably the GC should keep track of how many files are open and call collect() when a threshold is reached. Still, forcing every file to be collected only by the cyclic GC seems too ugly of a solution to this issue.

pitrou · 2017-12-20T19:30:58Z

Here's another idea: have a new type field (a tp_something) that exposes the GC priority for this type. It could be an integer between -8 and 7, for example (0 being the default). Then tp_finalize would be called in sorted priority order.

The tricky part of course is to make this acceptable performance-wise (i.e. avoid walking the collectable list 16 times, once for each priority level). For example by creating bucket lists for each priority level (only for objects with a non-NULL tp_finalize).

Arusekk · 2018-01-24T15:38:06Z

Since the issue seems to have been active lately, may I suggest my view on solving it.

One solution that comes to my mind is to keep a weak reference to the file, and to register an atexit function per file (and to unregister it when the file is closed). Example concept-illustrating python code for the _pyio module:

import atexit, weakref

# ...

class TextIOWrapper(TextIOBase):
    def __init__(self):
        # ...
        self._weakclose = operator.methodcaller('close')
        atexit.register(self._weakclose, weakref.proxy(self))

# ...

    def close(self):
        atexit.unregister(self._weakclose)
        # and now actually close the file

There is a possibility of a negative impact arising from the use of operator.methodcaller, because it may return something cached in future versions of python. There is also the issue of unregistering an atexit function during atexit processing. But the general idea seems to be simple and worth considering.

nascheme · 2018-01-25T00:13:21Z

Using atexit is not the solution because the data can be lost even while the program is running, not just at interpreter shutdown. The problem occurs if the buffered file object and the underlying file object are both part of a reference cycle. Then, when the cycle is destroyed by the GC module, if the file __del__ is called before the buffer __del__, data is lost.

So far, the best solution I've come up with is as follows: split the buffered file object into two objects, a wrapper object that will be returned from open() and an underlying object that holds the actual buffer. Make the underlying file object keep references to all the buffers that are using the file. When the file _del__ gets called, first flush all of the buffers and then close the file. Splitting the buffered file object is required so that we don't create reference cycles.

njsmith · 2018-01-25T07:19:14Z

I attempted to implement my weakref idea (i.e. raw file keeps a weakref to the buffered file, calls flush before the raw file gets closed). That doesn't work either because the GC clears the weakref before calling __del__.

This may be a bit of a left-field or too-big-a-hammer suggestion, but as far I can tell from this thread [1] it probably is technically possible to modify the GC to clear weakrefs after calling __del__. Nick wasn't a fan (he likes the invariant that weakrefs can't trigger a resurrection), but if all else fails it might be worth re-raising.

You could add a secondary reference count on FileIO recording how many BufferedIO wrappers there are around it; then it's __del__ would skip calling close() if there are still BufferedIO wrappers, and BufferedIO.__del__ would decrement the reference count and close the underlying file if it hits zero and FileIO.__del__ had already been called.

[1] https://mail.python.org/pipermail/python-dev/2016-October/146747.html

vstinner · 2019-10-21T12:21:13Z

I marked bpo-38548 as duplicate of this issue. Copy of msg355062:

Consider the following program:

f = open("out.txt", "w")
f.write("abc\n")
exit(0)

Please note the absence of f.close().
The documentation
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files
says that you should use f.close() or with f = open(), but is not clear whether the program above without f.close() is guaranteed to write. The tutorial says:
"If you don’t explicitly close a file, Python’s garbage collector will eventually destroy the object and close the open file for you, but the file may stay open for a while. Another risk is that different Python implementations will do this clean-up at different times."
For me this sounds like even without f.close() the file is guaranteed to be written. If it is not guaranteed to be written, you should fix the documentation, if it is guaranteed to be written, then I will open another issue because the following program does not write into out.txt on my machine:

from sympy.core import AtomicExpr
class MyWeirdClass(AtomicExpr):
	def __init__(self):
		pass
f = open("out.txt", "w")
f.write("abc\n")
exit(0)

Note: sys.version is: "3.7.3 (default, Oct 7 2019, 12:56:13) \n[GCC 8.3.0]"

methane · 2020-11-27T00:42:01Z

New changeset c8aaf71 by Volker-Weissmann in branch 'master':
bpo-17852: Doc: Fix the tutorial about closing files (GH-23135)
c8aaf71

miss-islington · 2020-11-27T00:52:02Z

New changeset 01fcde8 by Miss Islington (bot) in branch '3.8':
bpo-17852: Doc: Fix the tutorial about closing files (GH-23135)
01fcde8

Volker-Weissmann · 2020-11-27T00:57:11Z

The PR below fixed it:

#23135

miss-islington · 2020-12-01T10:53:50Z

New changeset 4a44f53 by Miss Islington (bot) in branch '3.9':
[3.9] bpo-17852: Doc: Fix the tutorial about closing files (GH-23135) (GH-23527)
4a44f53

gpshead · 2023-05-26T22:51:37Z

is there still an issue to be fixed here? if so re-open and explain or open a new issue. The docmentation was updated to highlight that unclose files can't be guaranteed to flush buffered data.

nascheme · 2023-05-31T21:37:26Z

I think there is still an issue here, even though it is a low priority bug. It's good we document now that unclosed files could lose buffered data. However, it is bad, IMHO, that it works for 99%+ of the time but sometimes doesn't. I think that's worth fixing, if we can do it without too much pain.

I created GH-105171 as a different approach to fix this.

vstinner · 2023-06-01T19:59:20Z

Over last years, I made many enhancements to be able to log ResourceWarning as late as possible. But Python finalization is not deterministic and so some warnings cannot be logged when emitted "too late".

My latest failed attempt to make the finalization more determistic: #86837

serhiy-storchaka · 2023-06-06T07:12:55Z

The root cause is the GC algorithm for breaking reference cycles. If we have a cycle:

flowchart LR;
    A==>B;
    B==>C;
    C==>A;
    C-->TextIOWrapper;
    TextIOWrapper-->BufferedWriter;
    BufferedWriter-->FileIO;

then it will try to break it by breaking any reference, including references C-->TextIOWrapper, TextIOWrapper-->BufferedWriter and BufferedWriter-->FileIO, even if they do not lead to breaking the cycle. The only references which advance in reference cycle elimination are A-->B, B-->C and C-->A. If the algorithm be smarter, this would solve most of problems, not only with IO.

arigo mannequin added topic-IO type-bug An unexpected behavior, bug, or error labels Apr 27, 2013

pitrou reopened this Dec 12, 2017

nascheme self-assigned this Dec 19, 2017

pppery mannequin changed the title ~~Built-in module _io can lose data from buffered files at exit~~ Built-in module _io can lose data from buffered files in reference cycles Jan 27, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

gpshead closed this as completed May 26, 2023

bedevere-bot mentioned this issue May 31, 2023

GH-62052: add defer_close logic to FileIO #105171

Open

Built-in module _io can lose data from buffered files in reference cycles #62052

Built-in module _io can lose data from buffered files in reference cycles #62052

Comments

arigo mannequin commented Apr 27, 2013 • edited by bedevere-bot

Linked PRs

arigo mannequin commented Apr 27, 2013

pitrou commented Apr 27, 2013

neologix mannequin commented Apr 27, 2013

arigo mannequin commented Apr 27, 2013

neologix mannequin commented Apr 28, 2013

vstinner commented Apr 29, 2013

neologix mannequin commented Apr 29, 2013

vstinner commented Apr 29, 2013

neologix mannequin commented Apr 30, 2013

vstinner commented Apr 30, 2013

serhiy-storchaka commented Nov 19, 2014

vstinner commented Nov 19, 2014

serhiy-storchaka commented Nov 19, 2014

arigo mannequin commented Nov 20, 2014

nikratio mannequin commented Dec 4, 2014

neologix mannequin commented Dec 4, 2014

arigo mannequin commented Dec 4, 2014

serhiy-storchaka commented Dec 4, 2014

arigo mannequin commented Dec 4, 2014

arigo mannequin commented Dec 4, 2014

pitrou commented Dec 4, 2014

pitrou commented Dec 4, 2014

arigo mannequin commented Dec 4, 2014

pitrou commented Dec 4, 2014

arigo mannequin commented Dec 4, 2014

pitrou commented Dec 4, 2014

vstinner commented Dec 4, 2014

pitrou commented Dec 4, 2014

vstinner commented Dec 13, 2017

nascheme commented Dec 13, 2017

nascheme commented Dec 14, 2017

pitrou commented Dec 14, 2017

nascheme commented Dec 15, 2017

serhiy-storchaka commented Dec 15, 2017

nascheme commented Dec 15, 2017

nikratio mannequin commented Dec 15, 2017

nascheme commented Dec 19, 2017

pitrou commented Dec 19, 2017

nascheme commented Dec 19, 2017

pitrou commented Dec 20, 2017

Arusekk mannequin commented Jan 24, 2018

nascheme commented Jan 25, 2018

njsmith commented Jan 25, 2018

vstinner commented Oct 21, 2019

methane commented Nov 27, 2020

miss-islington commented Nov 27, 2020

Volker-Weissmann mannequin commented Nov 27, 2020

miss-islington commented Dec 1, 2020

gpshead commented May 26, 2023

nascheme commented May 31, 2023

vstinner commented Jun 1, 2023

serhiy-storchaka commented Jun 6, 2023 • edited

arigo mannequin commented Apr 27, 2013 •

edited by bedevere-bot

serhiy-storchaka commented Jun 6, 2023 •

edited