This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Python breaks OS' append guarantee on file writes
Type: behavior Stage: resolved
Components: None Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: bsdphk, jcea, neologix, pitrou, r.david.murray, schmir
Priority: normal Keywords:

Created on 2012-08-18 20:19 by bsdphk, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (10)
msg168528 - (view) Author: Poul-Henning Kamp (bsdphk) Date: 2012-08-18 20:19
When a file is opened in append mode, the operating system guarantees that all write(2) system calls atomically appended their payload to the file.

At least on FreeBSD, Python breaks this guarantee, by chopping up large writes into multiple write(2) syscalls to the OS.

Try running this program using ktrace/truss/strace or a similar system-call tracing facility:

   fo = open("/tmp/_bogus", "ab", 0)
   fo.write(bytearray(1024*1024))
   fo.close()

Instead of one single megabyte write, I see 1024 kilobyte writes.

(BTW: Why only one kilobyte ? That is an incredible pessimisation these days...)

I leave it to the python community to decide if this should be fixed, or merely pointed out in documentation (os.write() is a workaround)
msg168529 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 20:22
> When a file is opened in append mode, the operating system guarantees
> that all write(2) system calls atomically appended their payload to the 
> file.

Does it? I don't see such strong guarantees in http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html

In any case, Python 2 uses fwrite() not write(), so it may be the explanation. Do you observe the same behaviour when using io.open() instead of open()?

(io.open() is the Python 3 IO stack backported to Python 2)
msg168530 - (view) Author: Poul-Henning Kamp (bsdphk) Date: 2012-08-18 20:24
Yes, it does:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
msg168531 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 20:25
Ah, sorry. I was stupidly looking for "atomic" and only found the pipe-specific remarks.
(the other points remain, though :-))
msg168532 - (view) Author: Poul-Henning Kamp (bsdphk) Date: 2012-08-18 20:30
I have not tried io.open(), nor would I suspect most users would realize that they needed to do so, in order to get the canonical behaviour from an operation called "write" on a file opened in "append" mode.

IMO: If pythons file.write() does not give the guarantee POLA would indicate, it's either a bug or a doc-issue, no matter how many workarounds might exist.

But I have neither a clue to the aspirational goals of python, nor to what it might take to fix this, so it's entirely your call.
msg168533 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 20:36
> I have not tried io.open(), nor would I suspect most users would
> realize that they needed to do so, in order to get the canonical
> behaviour from an operation called "write" on a file opened in
> "append" mode.

The reason I'm asking is that open() is the same as io.open() in Python
3.x, which is currently the main development line. That said, I can find
the results myself.

Python 2 is in bugfix mode, so it's impossible to rewrite the I/O
routines to use unbuffered I/O instead of C buffered I/O.

> IMO: If pythons file.write() does not give the guarantee POLA would
> indicate, it's either a bug or a doc-issue, no matter how many
> workarounds might exist.

What do you call POLA?

> But I have neither a clue to the aspirational goals of python, nor to
> what it might take to fix this, so it's entirely your call.

Well as I said, Python 2 will be pretty much impossible to fix (we call
fwrite() with the argument, not write()). Python 3 is a different story
since we use our own buffering layer and then C's unbuffered API.

As a sidenote, do you know if writev() has the same guarantee as
write()? POSIX doesn't seem to say so.
msg168534 - (view) Author: Poul-Henning Kamp (bsdphk) Date: 2012-08-18 20:50
POLA = Principle Of Least Astonishment

We use that a lot in architectural decision in FreeBSD :-)

As I said: You deal with this as you see fit. If all python2 gets is a doc- or errata-notice, that's perfectly fine with me.

I interpret "The writev() function shall be equivalent to write(), except as described below." as writev() giving the same atomic append guarantee.

In FreeBSD, write() is implemented using writev() and I expect that is the obvious and thus common way it is done.

(You seem to be right with respect to the 1024: That is indeed still the BUFSIZ on FreeBSD, I'll work on getting that changed.)
msg168539 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-08-18 23:14
Even if we write in chunks, if we are calling the OS write function and O_APPEND is set, wouldn't be satisfying the condition?  Or, rather, the OS would be.  That is, I don't really see a guarantee of an *atomic* write in the quoted description.
msg168540 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 23:20
> Even if we write in chunks, if we are calling the OS write function
> and O_APPEND is set, wouldn't be satisfying the condition?  Or,
> rather, the OS would be.  That is, I don't really see a guarantee of
> an *atomic* write in the quoted description.

I'm not sure it's guaranteed to be atomic at the hardware level, but as
AFAIU the updates should be atomic as seen from other processes on the
same machine (i.e. filesystem cache coherency).

As a side-note, I've just tested under Linux with the following script:

    with open("foo", "ab") as f:
        f.write(b"abcd")
        f.write(b"x" * (1024 ** 2))

Results:

- on 2.7, the write buffers get sliced up (the glibc's fwrite() doesn't
care about atomicity):
write(3, "abcdxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 4096) = 4096
write(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1044480) = 1044480

- on 3.2 and 3.3, our home-grown buffering respects the original
buffers:
write(3, "abcd", 4)                     = 4
write(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 1048576) = 1048576

(but that's purely out of luck, since we didn't design it with that
goal :-))
msg168991 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2012-08-24 09:48
I wouldn't rely on O_APPEND too much:
- it won't work on NFS, and probably other non-local filesystems
- it doesn't actually guarantee atomicity, because even though the the file offset and the write is done with locking, there is still the possibility of partial write
History
Date User Action Args
2022-04-11 14:57:34adminsetgithub: 59928
2019-12-29 05:04:43gvanrossumsetstatus: open -> closed
resolution: not a bug
stage: resolved
2012-09-10 01:13:41jceasetnosy: + jcea
2012-08-24 09:55:08schmirsetnosy: + schmir
2012-08-24 09:48:24neologixsetnosy: + neologix
messages: + msg168991
2012-08-18 23:20:05pitrousetmessages: + msg168540
2012-08-18 23:14:01r.david.murraysetnosy: + r.david.murray
messages: + msg168539
2012-08-18 20:50:31bsdphksetmessages: + msg168534
2012-08-18 20:36:39pitrousetmessages: + msg168533
2012-08-18 20:30:08bsdphksetmessages: + msg168532
2012-08-18 20:25:13pitrousetmessages: + msg168531
2012-08-18 20:24:23bsdphksetmessages: + msg168530
2012-08-18 20:22:37pitrousetnosy: + pitrou
messages: + msg168529
2012-08-18 20:19:36bsdphkcreate