classification
Title: Adding an atomic FS write API
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: out of date
Dependencies: 8828 Superseder:
Assigned To: Nosy List: Arfrever, barry, eric.araujo, eric.smith, exarkun, giampaolo.rodola, loewis, martin.panter, meatballhat, milko.krachounov, ncoghlan, neologix, olemis, pitrou, socketpair, takluyver, tarek, vstinner
Priority: normal Keywords: patch

Created on 2010-05-03 12:43 by olemis, last changed 2016-01-20 14:22 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
atomic_write.patch vstinner, 2013-05-15 23:31 review
atomic_write_mkstemp.patch vstinner, 2013-05-15 23:50 review
Messages (50)
msg104833 - (view) Author: Olemis Lang (olemis) Date: 2010-05-03 12:43
Often I have the contents to be written in a file at a given path that
I know as well. I recently tried to find a function in stdlib to do
that and to my surprise this is what I found :

 - Such function exists
 - It's `distutils.file_util.write_file`

IMO the last place where people'd look for such a function is inside
`distutils` package. Besides I reviewed modules listed under `File and
directory access` category in `Library Reference` and found nothing
even similar.

The idea is to provide a a similar function in `shutils` module
msg104834 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-03 12:54
This sounds silly to me. You can write a file in two lines:

with open("foo", "wb") as f:
    f.write(contents)

If you want to do something more useful, you can add a function for atomic writing of a file.
msg104835 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-05-03 13:01
Does the standard library really need something so trivial? I'd put it in your own program. And I'd make the one in distutils private (and fix it to use a with statement).
msg104836 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-03 13:42
@Eric: remember that distutils is frozen, so it won't be removed. And historically Distutils code was meant to be 2.4 compatible (thus, no with)

@Antoine: Yes, that would be the idea (provide a robust pattern by using a temporary file, then rename it)
msg104837 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-03 13:53
> @Antoine: Yes, that would be the idea (provide a robust pattern by
> using a temporary file, then rename it)

Then perhaps it would be better as a context manager:

with shutil.atomic_write("foo", "wb") as f:
    f.write("mycontents")
msg104855 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-03 17:21
Well, a context manager sounds overkill since we just want to write some content in a file (and nothing else during that context). 

I think a simple call is straightforward:

  shutil.atomic_write("foo", "contents", mode="wb")
msg104856 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-03 17:22
Notice that "contents" could be replaced here by an iterator, that would return data to send to write()
msg104857 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-03 17:31
> Well, a context manager sounds overkill since we just want to write
> some content in a file (and nothing else during that context). 

Using a context manager, though, is the recommended idiom to write
files. I think there's a value in remaining consistent. We don't want to
end up like PHP which has dozens of different idioms of doing similar
things.
msg104859 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-03 17:42
This idiom makes sense when you want to do some things with an open file, and replaces the usual try..finally idiom.

That's not what we want to do here. We want to write data in a file in a single step, in an atomic manner. 

Giving the ability to the developers to work in a context manager means that you potentially give them the ability to break this atomicity.

So I don't think the context manager idiom prevails, and should be avoided : shutil.atomic_write is asked to write a file, given some data, and don't return until it's done.
msg104860 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2010-05-03 17:48
Considering that it's extremely difficult to implement this correctly and in a cross-platform way, I think it makes sense as a stdlib addition (though I'd add it as a method of a `path` type rather than to the "shell utilities" module ;).
msg104862 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2010-05-03 17:54
I agree that with the addition of the new requirement that it be an atomic write, it should be in a library.

I'd also do it as a context manager, since that's the more general case. distutils2 can either call the context manager version, or have a trivial function that calls the context manager.
msg104863 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-05-03 17:55
> Giving the ability to the developers to work in a context manager
> means that you potentially give them the ability to break this
> atomicity.

AFAICT this doesn't make sense. The writing isn't atomic, the renaming
is.
msg104866 - (view) Author: Tarek Ziadé (tarek) * (Python committer) Date: 2010-05-03 18:21
@Antoine, @Eric: After some more thoughts a contextlib makes more sense.

I also have a use case for an atomic copytree() but I guess that can be an option in copytree()
msg106585 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-26 23:18
> This sounds silly to me. You can write a file in two lines:
>
> with open("foo", "wb") as f:
>    f.write(contents)

If the disk is full, write fails and the new file only contains a part of 'contents'. If the file does already exist and the write fails, the original content is lost.

The correct pattern is something like:

@contextlib.contextmanager
def atomic_write(filename):
  tempname = filename + ".tmp"
  out = open(tempname, "w")
  try:
     yield out
     if hasattr('os', 'fsync'):
        os.fsync(out.fileno())
     out.close()
     if os.name in ('nt', 'ce'):
        os.unlink(filename)
        # ... hope that it doesn't fail here ...
     os.rename(tempname, filename)
  except:
     out.close()
     os.unlink(tempname)
     raise

Remarks:
 - call fsync() to ensure that the new file content is written on disk. it does nothing on Mac OS X
 - on Windows, it's not possible to rename a to b if b does already exist. New Windows versions has an atomic function: MoveFileTransacted(). Older versions have MoveFileEx(MOVEFILE_REPLACE_EXISTING | MOVEFILE_WRITE_THROUGH) and ReplaceFile()

This context manager is *not* atomic on any OS. It's only atomic on some OS, and it may depends on the kernel version (see recent discussions about ext3/ext4, fsync and write barrier).
msg106586 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-26 23:26
An article about ext3 and fsync: "Solving the ext3 latency problem"
http://lwn.net/Articles/328363/

Another good article, written by the maintainer of ext4 (Theodore Ts'o): "Don’t fear the fsync!"
http://thunk.org/tytso/blog/2009/03/15/dont-fear-the-fsync/
msg106588 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-26 23:42
We need an atomic rename function to implement atomic_write(): I opened a new issue, #8828.
msg107427 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-09 22:18
Trac has an AtomicFile class which copies file mode, owner, etc.
http://trac.edgewall.org/browser/trunk/trac/util/__init__.py?#L145
msg135850 - (view) Author: Milko Krachounov (milko.krachounov) Date: 2011-05-12 18:26
I have a class for overwriting files "properly" that I use in three of my projects. It aims to be atomic as possible, supports creating backups, but it doesn't have functions to set or retain permissions when requested (which might be desirable if such a function is added to stdlib). I'd give it here for reference and ideas.

- It's a context manager acting like a normal file object so you can use it with e.g. json.dump. In CM mode, if an error is caught, you end up with the old file automatically. If you use it as a file, the 'close' method has a 'cancel' argument to achieve the same.
- Normal overwrite on POSIX uses flush, fsync, rename as it should.
- Since fsync doesn't work on Mac OS X, it takes care of calling the Mac OS X specific F_FULLFSYNC fcntl.
- On POSIX, if a backup is requested, an attempt is made to do it with a hardlink, otherwise do two renames (killing the atomicity). Maybe a copy with shutil would be a better choice though.
- On Windows it uses two renames - the old file is backed up to a temporary name, and then the new file is renamed over it. If a backup wasn't requested, the temporary name is deleted.

I also have a simple unit test for it, but I've ran it on POSIX only.

Here's the part of the code that does the open/close part:
http://bazaar.launchpad.net/~exabyte/blackherd/async-refactor/view/61/blackherd/misc.py#L498

And the unit test:
http://bazaar.launchpad.net/~exabyte/blackherd/async-refactor/view/61/tests/test_misc.py#L473

I hope that's useful.
msg135852 - (view) Author: Milko Krachounov (milko.krachounov) Date: 2011-05-12 18:33
Well, since I had a typo in the main method which called the Windows implementation on POSIX, the unit test works on the code for Windows when ran on POSIX. Heh, I'm sorry for the noise, but it seems that re-reading the code four times and running the unit tests is not enough, corrected revision:

http://bazaar.launchpad.net/~exabyte/blackherd/async-refactor/view/62/blackherd/misc.py#L498
msg135912 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-05-13 14:23
Something's missing in all the implementations presented:
to make sure that the new version of the file is available afer a crash, fsync must be called on the containing directory after the rename.
msg135917 - (view) Author: Milko Krachounov (milko.krachounov) Date: 2011-05-13 15:24
> Something's missing in all the implementations presented:
> to make sure that the new version of the file is available afer 
> a crash, fsync must be called on the containing directory after 
> the rename.

I upgraded my proposed approach to include dir fsync, and to do a copy when backing up instead of rename (is that good?)

http://bazaar.launchpad.net/~exabyte/blackherd/async-refactor/view/66/blackherd/misc.py#L498
msg150002 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-21 15:57
I'd like to make this move forward.
Milko, I see you have an implementation which looks OK at a first glance. Would like to propose a patch against the default branch? The patch should include documentation and tests.
I'm not sure about the best module to host this, though: os.path ?
msg150036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-12-21 19:15
> I'm not sure about the best module to host this, though: os.path ?

Some OS don't provide atomic rename. If we only define a function when it is atomic (as we do in the posix module, only expose functions available on the OS), programs will have to write two versions of their function. I prefer to write a "best-effort" function, and so I consider that shutil is the best place for such function.

If this function requires a function specific to the OS, the specific function can be added to posix (or another module).

Example:

with shutil.write_file(filename) as f:
  f.write("I hope it will be atomic!")

We need an "atomic rename file" function to implement "atomic write file": see #8828.
msg150084 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-22 10:10
> I'm not sure about the best module to host this, though: os.path ?

os.path is mostly about path manipulation functions, although it also performs directory I/O (e.g. realpath()).
msg150087 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-22 10:51
> I prefer to write a "best-effort" function

I disagree. People who explicitely use an atomic file API want atomicity/persistency, otherwise they wouldn't use it. Exposing a function that may, or may not, be atomic is just plain wrong.
IMHO, the right thing to do on OSes that don't provide atomic rename (and I doubt there are many of them, see below) is to raise an exception, so that the user can fall back to whatever he thinks is best (bail out, rollback, etc).

> and so I consider that shutil is the best place for such function.

As noted by Jean-Paul, shutil stands for "shell utils": that would be a rather poor choice: atomicfile fits in shutil as much as tempfile would :-)

> Some OS don't provide atomic rename.

Which one?
See Antoine's message:
http://bugs.python.org/issue8828#msg146274

Apparently, Windows >= XP does have an atomic rename(), and every POSIX compliant OS rename(2) should be atomic.

> os.path is mostly about path manipulation functions

I agree.
I wish we had something like:
io.file
io.file.tempfile
io.file.path
io.file.atomicfile

Thoughts?
msg150145 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-23 10:27
> I wish we had something like:
> io.file
> io.file.tempfile
> io.file.path
> io.file.atomicfile

I adhere to "flat is better than nested". e.g. it always irks me to type "urllib.request" instead of "urllib". A 3-level deep nesting in the stdlib would certainly shock many people :)
msg150156 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-23 11:57
> I adhere to "flat is better than nested". e.g. it always irks me to type
> "urllib.request" instead of "urllib".

Well, that's what (selective) imports are for, no ?
from urllib import request
from file.path import exists
import xml.etree.cElementTree as ET

Anyway, my "thoughts?" wasn't referring to this namespace hierarchy
"proposal" (which really isn't one), but rather to the best module
which could host an AtomicFile class.
Still shutil?
msg150157 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-23 12:09
> Anyway, my "thoughts?" wasn't referring to this namespace hierarchy
> "proposal" (which really isn't one), but rather to the best module
> which could host an AtomicFile class.

Does it have to be a class? What would be the operations apart from
write()?

> Still shutil?

I think that's our best compromise.
msg150158 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-23 12:54
> Does it have to be a class? What would be the operations apart from
> write()?

Well, I thought that making it a file-like object could be useful:
that way, one could pass it to pickle.dump(), logging.StreamHandler or
any method expecting a file-like object, and would gain atomicity
(persistency) transparently, without refactoring.
My quick and dirty AtomicFile implementation reused the
_TemporaryFileWrapper attribute delagation trick:
"""
    def __getattr__(self, name):
        # Attribute lookups are delegated to the underlying file
        # and cached for non-numeric results
        # (i.e. methods are cached, closed and friends are not)
        file = self.__dict__['file']
        a = getattr(file, name)
        if not issubclass(type(a), type(0)):
            setattr(self, name, a)
        return a
"""

>> Still shutil?
>
> I think that's our best compromise.

OK.
I don't have a strong opinion about this, and you've got much more
experience than me, so I trust your judgement ;-)
msg150160 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-23 13:13
> > Does it have to be a class? What would be the operations apart from
> > write()?
> 
> Well, I thought that making it a file-like object could be useful:
> that way, one could pass it to pickle.dump(), logging.StreamHandler or
> any method expecting a file-like object, and would gain atomicity
> (persistency) transparently, without refactoring.

Mmmh... I might have misunderstood the proposal then. I thought this
"atomic write" API was about writing the file contents in one go and
immediately closing the file. If you want atomicity to apply to logging,
you must instead guarantee the durability of each write() call, meaning
calling fsync() on each logging call, which would be very expensive.

Am I missing something?
msg150163 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-23 14:28
> If you want atomicity to apply to logging,
> you must instead guarantee the durability of each write() call, meaning
> calling fsync() on each logging call

Why so?
There's no point in calling fsync() after each write, since data is
written to a temporary file which won't be visible before rename():
even if you call fsync() after each write, if the file is not properly
closed, it won't be committed (i.e.renamed).
You just need to call fsync() once before closing the file, and then
rename it, to ensure that the committed file version is consistent.
In the meantime, you could do whatever you want with the stream (the
hypothetical AtomicFile): seek(), read(), etc. I'm not sure many
methods besides write will be useful, but for example one could easily
imagine that the library expects the passed object to have a fileno()
method.
msg150164 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-23 14:40
> There's no point in calling fsync() after each write, since data is
> written to a temporary file which won't be visible before rename():
> even if you call fsync() after each write, if the file is not properly
> closed, it won't be committed (i.e.renamed).

Ah, it's a temporary file indeed.
But does that mean you can't inspect your logs in real time? All log
files I have ever seen are available under their final name while they
are still being written to.

> In the meantime, you could do whatever you want with the stream (the
> hypothetical AtomicFile): seek(), read(), etc.

I see. Yes, then an AtomicFile class makes sense (not for logging
though, IMHO :-)).

And I understand your point about it being available in io or some io
submodule (not a 3-level deep one though :-)). Perhaps you want to ask
on python-dev?
msg150174 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-12-23 15:50
> Ah, it's a temporary file indeed.
> But does that mean you can't inspect your logs in real time? All log
> files I have ever seen are available under their final name while they
> are still being written to.
>

Yes, logging was a poor example :-)

> And I understand your point about it being available in io or some io
> submodule (not a 3-level deep one though :-)). Perhaps you want to ask
> on python-dev?

OK, I'll try that.
msg153451 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2012-02-16 01:24
I'm opposed to adding "atomic" files to the standard library. This has nothing to do with the original issue; anybody contributing to the issue is making up user needs that have not been sufficiently demonstrated.

It would be best to release such code on PyPI first, and wait a few years to see whether there really is an end-user demand for the feature.
msg174104 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-10-29 07:00
Since it may not be clear from the rest of the thread, the os.replace() cross-platform atomic renaming building block was added in 3.3 (in #8828, which Victor linked above)

Another stdlib use case now also exists in importlib (see http://hg.python.org/cpython/file/3.3/Lib/importlib/_bootstrap.py#l121). However, due to bootstrapping issues, importlib would not be able to use a version that was provided in a module like shutil.
msg189324 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-15 23:31
Here is a patch based on the new os.replace() function. I only tested the patch on Linux.

My patch must be tested on Windows, I don't know what happens when the file object is closed when the NamedTemporaryFile.delete attribute is set to False. Does close() raise an exception?

I hesitate between prefix and dir parameters for NamedTemporaryFile. If prefix is equal to filename, is it possible for NamedTemporaryFile to create a file called filename? Or does it always add a suffix? If the program crashs before rename, I prefer to leave "file.XXX" temporary files (ex: "file.tmp", "file.tmp.2", ...) instead of random names like "tmpXXX".

It is important to create the temporary file in the same directory to not create a file on a different file system. os.replace() fails on POSIX if the source and the destination are on two different file systems.

If importing tempfile in shutil is a problem, it may be done in atomic_write() (maybe using a global variable to ensure that the module is only imported once).

--

First, I tried to handle the temporary file manually (without the tempfile module) with a constant suffix. I only created a temporary file if the creation of the file (in exclusive mode, using "x" mode") failed.

But Antoine pointed me on IRC that the function is not atomic because the file may be invalid before the file content is completly written and flushed. If two threads try to create the same file, the second thread gets a FileExistsError. In this case, the caller has to handle FileExistsError. The caller may remove the temporary file of the first thread, which may make the situation even worse.

I prefer to use tempfile.NamedTemporaryFile because it is portable and well tested. It also uses a random suffix. On Windows, the O_TEMPORARY flag is passed to os.open(), so the file is removed by the OS when the file is closed, or when the program does crash.
msg189325 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-15 23:35
+       # Flush Python buffers to system buffers
+       fileobj.flush()
+
+       if hasattr(os, 'fsync'):
+           # Flush system buffers to disk
+           fd = fileobj.fileno()
+           os.fsync(fd)

A fsync=True paramater may be added if calling os.fsync() is an issue.
msg189329 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-15 23:45
"I prefer to use tempfile.NamedTemporaryFile because it is portable and well tested. It also uses a random suffix. On Windows, the O_TEMPORARY flag is passed to os.open(), ..."

Oh, and it sets also the close-on-exec safe, which is also more secure.
msg189330 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-15 23:50
atomic_write.patch calls os.replace(src, dst) whereas the src file is open. It works on Linux, but it sounds a little bit strange to me and may fail on other platforms.

Here is another patch (atomic_write_mkstemp.patch) using tempfile.mkstemp() instead of tempfile.NamedTemporaryFile(delete=True) to control when the file is closed and removed.
msg189332 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-05-16 00:12
> Here's the part of the code that does the open/close part:
> http://bazaar.launchpad.net/~exabyte/blackherd/async-refactor/view/61/blackherd/misc.py#L498

This code contains a bug: hasattr('os', 'fsync') is never True :-)

This part is interesting:

        # fsync on Mac OS X doesn't work, it requires using the
        # F_FULLSYNC fcntl
        if hasattr(fcntl, 'F_FULLFSYNC'):
            fcntl.fcntl(self._fd, fcntl.F_FULLFSYNC)

=> see also #11877

> http://trac.edgewall.org/browser/trunk/trac/util/__init__.py?#L145

This class copies file owner, permissions, flags, etc. atomic_write() should probably also call copystat() on the temporary file.
msg189333 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-16 00:51
We have one of these in Beaker (we didn't need to wait for os.replace, since Beaker only runs on Linux): http://git.beaker-project.org/cgit/beaker/tree/Common/bkr/common/helpers.py?h=develop#n90

It turns out to be beneficial to separate the three operations in order to cooperate more cleanly with other operations on the same file system (in our case, we needed to be able to write data uploaded over a network connection to the temporary file *before* acquiring an flock when we did the rename).

We also create the file adjacent to the destination, as creating it under the /tmp heirarchy and then renaming it is invalid under SELinux (since the security context ends up being incorrect).

While Beaker is GPLv2, I'd easily be able to get permission to extract this and contribute it to the PSF.

Incidentally, I'm thinking tempfile might actually be a more sensible home for this than shutil.
msg189334 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-16 00:54
(Note that the Beaker version would need to be enhanced with the extra API parameters from Victor's version, as well as updated to use the exclusive open and close-on-exec flags)
msg189342 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2013-05-16 06:44
> (Note that the Beaker version would need to be enhanced with the extra API parameters from Victor's version, as well as updated to use the exclusive open and close-on-exec flags)

I think the API would be nicer if it was just a wrapper around the
underlying temporary file object, delegating all methods except
close() (sort of like the _TemporaryFileWrapper in tempfile).

Something like (untested, not even foolproof-read) :

class AtomicFile:

    def __init__(self, path):
        self.dest_path = path
        dirname, basename = os.path.split(self.dest_path)
        fd, temp_path = tempfile.mkstemp(prefix='.' + basename, dir=dirname)
        try:
            self.f = os.fdopen(fd, 'w')
        except:
            os.unlink(temp_path)
            raise
        self.temp_path = temp_path

    def close(self):
        self.f.close()
        os.rename(self.temp_path, self.dest_path)

    # copied from tempfile
    def __getattr__(self, name):
        # Attribute lookups are delegated to the underlying file
        # and cached for non-numeric results
        # (i.e. methods are cached, closed and friends are not)
        file = self.__dict__['file']
        a = getattr(file, name)
        if not issubclass(type(a), type(0)):
        setattr(self, name, a)
        return a

    def __enter__self():
        self.file.__enter__()
        return f

    def __exit__(self, exc, value, tb):
       if exc is None:
           self.close()
       else:
           self.file.close()
           os.remove(self.temp_path)

This way, one just has to do:
f = AtomicFile(path)
and then use it as a normal file; it will automatically be committed
(or rollback) when the file is closed, either because you're leaving
the context manager, or because an explicit() close is called.
It also makes it easier to use with legacy code/libraries accepting an
open file (since no modification is needed), is less error-prone
(since you don't have to remember to call special methods like
destroy_temp/replace_dest), and leads to a more compact API (exactly
the same as regular files).

Otherwise, I think the calls to fsync() should be optional (maybe an
option passed to the constructor): most of the time, you want
atomicity but not durability (i.e. you don't really care if data is
committed to disk), and you don't want to pay for the performance hit
incurred by fsync().

Also, fsync() should also be done on the containing directory (which
is not the case in Victor's version).
msg189382 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2013-05-16 14:11
Of course, I have my own atomic-rename thing, but I'm not going to post the code here.  It's fairly limited to my own use case and I have no interest in making it cross platform.

That being said, I personally found that a context manager with a signature identical to built-in open() was the most convenient API.  It looks natural:

with atomic(filename, 'w', encoding='utf-8') as fp:
   data = do_a_bunch_of_things_that_might_fail()
   fp.write(data)

If any of that fails, the temporary file is guaranteed to be cleaned up, otherwise, filename (which is the ultimate destination) will be guaranteed to exist.

Another reason why context managers are useful is with ExitStack(), where you might have a bunch of conditions that you want to guarantee get cleaned up properly if any of them fail.
msg256849 - (view) Author: Марк Коренберг (socketpair) * Date: 2015-12-22 18:41
You also forgot about two things:

1. set temporary file permissions before rename
2. fsync(open(os.dirname(temporaryfile)))
3. if original file name is symlink, replace will works wrong. os.realpath should be used.

So, here are real life function, that we use in production:


# TODO: save_mtime, save_selinux, extended attr, chattrs and so on...
# TODO: malicious user may replace directory with another while writing to file,
#       so we should use open directory first, and then use openat() and renameat()
#       with relative filename instead of specifying absolute path.
#       also NameTemporaryFile should allow to specify dir=<dir_file_descriptor>
@contextmanager
def replace_file(path, save_perms=False, fsync=True, **kwargs):
    realpath = os.path.realpath(path)
    operating_dir = os.path.dirname(realpath)
    uid = None
    gid = None
    mode = None
    if save_perms:
        try:
            stinfo = os.lstat(realpath)
            uid = stinfo.st_uid
            gid = stinfo.st_gid
            mode = stinfo.st_mode
        except OSError as e:
            if e.errno != errno.ENOENT:
                raise

    with NamedTemporaryFile(dir=operating_dir, **kwargs) as fff:
        filedes = fff.fileno()
        if None not in (uid, gid, mode):
            os.fchown(filedes, uid, gid)
            os.fchmod(filedes, mode & 0o7777)

        yield fff

        # survive application crash
        fff.flush()
        if fsync:
            # survive power outage, is not required if that is temporary file
            os.fsync(filedes)
        os.rename(fff.name, realpath)

        # see http://bugs.python.org/issue21579
        fff._closer.delete = False

    if fsync:
        # Sync directory: http://stackoverflow.com/questions/3764822/how-to-durably-rename-a-file-in-posix
        dirfd = os.open(operating_dir, os.O_RDONLY | os.O_CLOEXEC | os.O_DIRECTORY)
        try:
            os.fsync(dirfd)
        finally:
            os.close(dirfd)
msg256850 - (view) Author: Марк Коренберг (socketpair) * Date: 2015-12-22 18:43
instead of realpath, os.lstat() may be used in order to detect if target file object is not plain file. Also, os.rename() cannot be used since it follows symlinks by default.
msg256851 - (view) Author: Марк Коренберг (socketpair) * Date: 2015-12-22 18:45
Also, modern kernels allows to replace entire directory!

renameat2() + RENAME_EXCHANGE

So, there should be more atomic operations
msg256961 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-12-24 13:09
This issue is old and different operating systems provide different
warranties on rename. Maybe this project should start as a project on PyPI
to find the best API and catch compatibilitites issues. For example
os.scandir() also started on PyPI.
msg258681 - (view) Author: Thomas Kluyver (takluyver) * Date: 2016-01-20 13:16
For reference, we added our own atomic_writing() context manager in Jupyter/IPython, after someone lost work when save failed on a full disk. We initially went for the same approach - write to a temporary file, fsync, rename over the original file - but it caused issues with network filesystems and folders synced by Dropbox or similar services.

As a result we switched to an approach that copies the old file to a temporary name, writes the new data directly to the old name, then removes the temporary copy on success, or renames it back to the old name on failure. This is certainly less rigorous, but it means that we're not replacing the file on every save, and it's less of a problem if copying file attributes with copystat() fails.

I'm sure there are use cases for atomic writing where these problems won't come up, but I hope this is a useful data point, at least for documenting any atomic writing solution.
msg258684 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-20 14:22
It looks like each platform and each filesystem behave differently, with subtle differencies with different kernel versions. I'm not more sure that the Python stdlib is the best place to host an implementation of an "atomic write" API.

I now suggest to create a project on PyPI, play with it, try to handle as much platforms as possible, etc. Then when it is considered as stable and robust enough, propose to integrate it to the Python stdlib.

Python stdlib is not the best place for best effort algorithm, we prefer to only add reliable and portable code. And leave "other code" outside the stdlib.
History
Date User Action Args
2016-01-20 14:22:34vstinnersetstatus: open -> closed
resolution: out of date
messages: + msg258684
2016-01-20 13:16:43takluyversetnosy: + takluyver
messages: + msg258681
2015-12-24 13:09:55vstinnersetmessages: + msg256961
2015-12-22 18:45:40socketpairsetmessages: + msg256851
2015-12-22 18:43:11socketpairsetmessages: + msg256850
2015-12-22 18:41:09socketpairsetnosy: + socketpair
messages: + msg256849
2013-05-16 14:11:36barrysetmessages: + msg189382
2013-05-16 13:01:08pitrousetstage: patch review
2013-05-16 12:58:18Arfreversetnosy: + Arfrever
2013-05-16 06:44:31neologixsetmessages: + msg189342
2013-05-16 00:54:57ncoghlansetmessages: + msg189334
2013-05-16 00:51:39ncoghlansetassignee: tarek ->
messages: + msg189333
2013-05-16 00:12:13vstinnersetmessages: + msg189332
2013-05-15 23:50:03vstinnersetfiles: + atomic_write_mkstemp.patch

messages: + msg189330
2013-05-15 23:45:48vstinnersetmessages: + msg189329
2013-05-15 23:35:11vstinnersetmessages: + msg189325
2013-05-15 23:31:38vstinnersetfiles: + atomic_write.patch
keywords: + patch
messages: + msg189324
2013-05-15 19:39:47barrysetnosy: + barry
2012-11-02 23:57:02martin.pantersetnosy: + martin.panter
2012-10-29 07:00:29ncoghlansetnosy: + ncoghlan

messages: + msg174104
versions: + Python 3.4, - Python 3.2
2012-02-16 01:24:37loewissetnosy: + loewis
messages: + msg153451
2011-12-23 15:50:57neologixsetmessages: + msg150174
2011-12-23 14:40:41pitrousetmessages: + msg150164
2011-12-23 14:28:37neologixsetmessages: + msg150163
2011-12-23 13:13:22pitrousetmessages: + msg150160
2011-12-23 12:54:19neologixsetmessages: + msg150158
2011-12-23 12:09:05pitrousetmessages: + msg150157
2011-12-23 11:57:14neologixsetmessages: + msg150156
2011-12-23 10:27:36pitrousetmessages: + msg150145
2011-12-22 10:51:27neologixsetmessages: + msg150087
2011-12-22 10:10:19pitrousetmessages: + msg150084
2011-12-21 19:15:09vstinnersetmessages: + msg150036
2011-12-21 15:57:44neologixsetmessages: + msg150002
2011-05-13 15:24:26milko.krachounovsetmessages: + msg135917
2011-05-13 14:23:57neologixsetnosy: + neologix
messages: + msg135912
2011-05-12 18:33:11milko.krachounovsetmessages: + msg135852
2011-05-12 18:26:26milko.krachounovsetnosy: + milko.krachounov
messages: + msg135850
2010-06-09 22:18:49vstinnersetmessages: + msg107427
2010-05-26 23:42:23vstinnersetdependencies: + Atomic function to rename a file
messages: + msg106588
2010-05-26 23:26:12vstinnersetmessages: + msg106586
2010-05-26 23:18:50vstinnersetnosy: + vstinner
messages: + msg106585
2010-05-07 21:51:21eric.araujosetnosy: + eric.araujo
2010-05-03 22:16:18giampaolo.rodolasetnosy: + giampaolo.rodola
2010-05-03 18:23:10tareksettitle: Alias for distutils.file_util.write_file in e.g. shutils -> Adding an atomic FS write API
2010-05-03 18:21:20tareksetmessages: + msg104866
2010-05-03 17:55:33pitrousetmessages: + msg104863
2010-05-03 17:54:33eric.smithsetmessages: + msg104862
2010-05-03 17:48:48exarkunsetnosy: + exarkun
messages: + msg104860
2010-05-03 17:42:31tareksetmessages: + msg104859
2010-05-03 17:31:53pitrousetmessages: + msg104857
2010-05-03 17:22:58tareksetmessages: + msg104856
2010-05-03 17:21:59tareksetmessages: + msg104855
2010-05-03 16:13:19meatballhatsetnosy: + meatballhat
2010-05-03 13:53:10pitrousetmessages: + msg104837
2010-05-03 13:42:23tareksetnosy: pitrou, eric.smith, tarek, olemis
messages: + msg104836
components: + Library (Lib), - Distutils
2010-05-03 13:01:26eric.smithsetnosy: + eric.smith
messages: + msg104835
2010-05-03 12:54:18pitrousetnosy: + pitrou

messages: + msg104834
versions: + Python 3.2, - Python 2.7
2010-05-03 12:43:57olemiscreate