classification
Title: Sporadic importlib failures: FileNotFoundError on os.rename()
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, brett.cannon, neologix, pitrou, python-dev, vinay.sajip, vstinner
Priority: normal Keywords: patch

Created on 2011-10-31 11:01 by vstinner, last changed 2011-11-10 21:42 by neologix. This issue is now closed.

Files
File name Uploaded Description Edit
import_atomic_race.diff neologix, 2011-10-31 11:32 review
import_atomic_race-1.diff neologix, 2011-10-31 16:51 review
import_atomic_race-2.diff neologix, 2011-10-31 17:31 review
importlib_write_atomic.patch vstinner, 2011-11-02 22:10
import_perm.diff neologix, 2011-11-03 19:06 review
Messages (19)
msg146691 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-31 11:01
Traceback (most recent call last):
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/regrtest.py", line 1186, in runtest_inner
    indirect_test()
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/test/test_ttk_textonly.py", line 11, in test_main
    *runtktests.get_tests(gui=False, packages=['test_ttk']))
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/tkinter/test/runtktests.py", line 65, in get_tests
    for module in get_tests_modules(gui=gui, packages=packages):
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/tkinter/test/runtktests.py", line 50, in get_tests_modules
    "tkinter.test")
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/__init__.py", line 123, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 834, in _gcd_import
    loader.load_module(name)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 460, in load_module
    return self._load_module(fullname)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 164, in decorated
    return fxn(self, module, *args, **kwargs)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 365, in _load_module
    exec(code_object, module.__dict__)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/tkinter/test/test_ttk/test_style.py", line 6, in <module>
    import tkinter.test.support as support
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 460, in load_module
    return self._load_module(fullname)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 164, in decorated
    return fxn(self, module, *args, **kwargs)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 353, in _load_module
    code_object = self.get_code(name)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 447, in get_code
    self.set_data(bytecode_path, data)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 514, in set_data
    _write_atomic(path, data)
  File "/export/home/buildbot/32bits/3.x.cea-indiana-x86/build/Lib/importlib/_bootstrap.py", line 94, in _write_atomic
    _os.rename(path_tmp, path)
FileNotFoundError: [Errno 2] No such file or directory
msg146692 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 11:17
There's a race in _write_atomic():
"""
        # On POSIX-like platforms, renaming is atomic
        path_tmp = path + '.tmp'
        try:
            fd = _os.open(path_tmp, _os.O_EXCL | _os.O_CREAT | _os.O_WRONLY)
            with _io.FileIO(fd, 'wb') as file:
                file.write(data)
            _os.rename(path_tmp, path)
        except OSError:
            try:
                _os.unlink(path_tmp)
            except OSError:
                pass
            raise
"""

Let's pretend a process managed to open the file (with O_EXCL): before it finishes and calls rename(), another process tries to open the file: since it can't create it (with O_EXCL), it jumps to:

        except OSError:
            try:
                _os.unlink(path_tmp)
            except OSError:
                pass
            raise

and unlinks the file.
The first process then calls rename()...
msg146693 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 11:32
Patch attached.
msg146696 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-31 14:44
This looks good to me. I'm slightly worried about what happens when there's a stale tmp file (for example if the Python process crashed before renaming it).
msg146700 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 15:08
Then we won't write the bytecode any more.
But it will be consistent.
The solution would be to create a random temporary file (with e.g. mkstemp()): in case of crash, we'll let dangling temporary files, but we'll be able to update the bytecode.
I'm note sure it's worth it (and it won't be easy since we're really restricted in what we're allowed to call).
msg146711 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 16:51
Here's a patch using pseudo-random filenames.
I used id(path) to generate a random name: it's faster than getpid(), and doesn't pollute strace's output (I'm one of strace's biggest fan :-).
msg146712 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-31 16:58
> Here's a patch using pseudo-random filenames.
> I used id(path) to generate a random name: it's faster than getpid(),
> and doesn't pollute strace's output (I'm one of strace's biggest
> fan :-).

If you go that way, you should also modify Python/import.c to use a
similar logic.
(it does sound a bit overkill to me :-))
msg146716 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 17:05
> (it does sound a bit overkill to me :-))

Well, it depends on what we want:
- if having a stale bytecode file indefinitely is acceptable, then the '.tmp' suffix approach is OK
- if not, then we should use pseudo-random files

> If you go that way, you should also modify Python/import.c to use a
> similar logic.

I think I would use mkstemp(), but yes, that's the idea.

So, what do you think?
msg146720 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-10-31 17:31
Here's a patch for Python/import.c using mkstemp(3). AFAICT, mkstemp should always be available (on Unix of course).
msg146723 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-31 18:33
> Here's a patch for Python/import.c using mkstemp(3). AFAICT, mkstemp
> should always be available (on Unix of course).

This looks fine to me. The code simplification tastes good :)
msg146739 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-10-31 19:48
New changeset 740baff4f169 by Charles-François Natali in branch 'default':
Issue #13303: Fix a race condition in the bytecode file creation.
http://hg.python.org/cpython/rev/740baff4f169
msg146843 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-02 16:28
mkstemp() creates the file with mode 0600, which can be surprising.
I'm note sure about the best way to handle this:
1) change the file mode after creating it with fchmod(), using the source file mode. But we must also take into account the umask, so we'd have to do:
mask = umask(0); 
umask(mask);
[...]
fd = mkstemp(cpathname_tmp)
fchmod(fd, mode & ~mask);

The double call to umask() is necessary because we can't just retrieve the umask
2) don't use mkstemp(), and use something like:

    sprintf(cpathname_tmp, "%s.%x", cpathname, 0xffff & getpid());

    d = open(cpathname_tmp, O_WRONLY|O_CREAT|O_EXCL);

to mimic what's done in Lib/importlib/_bootstrap.py (pseudo-temp file).

3) Fall back to the original ".tmp" suffix (with the risk of stale tmp file).

Thoughts?
msg146859 - (view) Author: Vinay Sajip (vinay.sajip) * (Python committer) Date: 2011-11-02 17:51
What's the downside of option 2) ?
msg146860 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-11-02 17:54
> Thoughts?

I would say go for the simpler (that's probably option 3).
msg146877 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-11-02 22:10
> 3) Fall back to the original ".tmp" suffix (with the risk
> of stale tmp file).

I missed something, what is the "stale tmp file" issue? Python/import.c uses:

    (void) unlink(filename);
    fd = open(filename, O_EXCL|O_CREAT|O_WRONLY|O_TRUNC ...);

So Python starts by removing the .tmp file, but it fails if another process is already writing into the .tmp file. In this case, we do nothing, which is not a problem: the other process will create the file.

Attached patch implements the same algorithm than import.c in importlib.
msg146881 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-02 22:56
> So Python starts by removing the .tmp file, but it fails if another 
> process is already writing into the .tmp file. In this case, we do 
> nothing, which is not a problem: the other process will create the 
> file.

unlink() does not fail, even if the file is open by another process with O_EXCL!
Therefore there's a race:
- process 1 opens file.tmp
- process 2 unlinks file.tmp
- process 2 opens file.tmp: this succeeds, since he just removed the file opened by proc 1
- process 1, which was working on its deleted file handle, is done, and renames file.tmp to file: except that it rename the file process 2 is in the middle of writing
- game over, file corrupted

> Attached patch implements the same algorithm than import.c in 
> importlib. 

Same race.


The current implementations are safe, both Python/import.c and Lib/importlib/_bootstrap.py
The only problem is that since import.c uses mkstemp, the file is created with mode 0600.
msg146962 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2011-11-03 19:06
Here's a patch using the '.tmp' suffix.
I also updated test_import.
Note that the current test_import.test_execute_bit_not_copied is a no-op:
"""
                fname = TESTFN + os.extsep + "py"
                create_empty_file(fname)
                os.chmod(fname, (stat.S_IRUSR | stat.S_IRGRP | stat.S_IROTH |
                                 stat.S_IXUSR | stat.S_IXGRP | stat.S_IXOTH))
                __import__(TESTFN)
                fn = imp.cache_from_source(fname)
                if not os.path.exists(fn):
                    self.fail("__import__ did not result in creation of "
                              "either a .pyc or .pyo file")
                    s = os.stat(fn)
                    self.assertEqual(
                        stat.S_IMODE(s.st_mode),
                        stat.S_IRUSR | stat.S_IRGRP | stat.S_IROTH)

"""

The indentation is wrong: the stat() is never performed...
I'll fix that in 3.2 and 2.7.
msg147315 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-08 19:39
New changeset 1238cdd288d2 by Charles-François Natali in branch 'default':
Back out changeset b6336ba796d4 until fix for #13303.
http://hg.python.org/cpython/rev/1238cdd288d2
msg147407 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-11-10 18:23
New changeset a9f10c3eff69 by Charles-François Natali in branch 'default':
Issue #13303: Fix bytecode file default permission.
http://hg.python.org/cpython/rev/a9f10c3eff69
History
Date User Action Args
2011-11-10 21:42:28neologixsetstatus: open -> closed
resolution: fixed
2011-11-10 18:23:35python-devsetmessages: + msg147407
2011-11-08 19:39:47python-devsetmessages: + msg147315
2011-11-03 19:06:12neologixsetfiles: + import_perm.diff

messages: + msg146962
2011-11-02 22:56:01neologixsetmessages: + msg146881
2011-11-02 22:10:00vstinnersetfiles: + importlib_write_atomic.patch

messages: + msg146877
2011-11-02 19:41:06Arfreversetnosy: + Arfrever
2011-11-02 17:54:57pitrousetmessages: + msg146860
2011-11-02 17:51:22vinay.sajipsetnosy: + vinay.sajip
messages: + msg146859
2011-11-02 17:47:32vinay.sajipsetresolution: fixed -> (no value)
2011-11-02 16:29:00neologixsetstatus: closed -> open

messages: + msg146843
2011-11-01 14:05:41neologixsetstatus: open -> closed
resolution: fixed
stage: resolved
2011-10-31 19:48:52python-devsetnosy: + python-dev
messages: + msg146739
2011-10-31 18:33:10pitrousetmessages: + msg146723
2011-10-31 17:31:07neologixsetfiles: + import_atomic_race-2.diff

messages: + msg146720
2011-10-31 17:05:50neologixsetmessages: + msg146716
2011-10-31 16:58:04pitrousetmessages: + msg146712
2011-10-31 16:51:21neologixsetfiles: + import_atomic_race-1.diff

messages: + msg146711
2011-10-31 15:08:00neologixsetmessages: + msg146700
2011-10-31 14:44:37pitrousetmessages: + msg146696
2011-10-31 11:32:36neologixsetfiles: + import_atomic_race.diff
keywords: + patch
messages: + msg146693
2011-10-31 11:17:33neologixsetnosy: + neologix
messages: + msg146692
2011-10-31 11:01:10vstinnercreate