classification
Title: Race conditions when opening and deleting a file on Mac OS X
Type: Stage: resolved
Components: IO, macOS Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: Giovanni Pizzi, ned.deily, ronaldoussoren
Priority: normal Keywords:

Created on 2020-07-13 14:04 by Giovanni Pizzi, last changed 2020-07-15 18:56 by ronaldoussoren. This issue is now closed.

Files
File name Uploaded Description Edit
concurrency-tests.tar.gz Giovanni Pizzi, 2020-07-13 14:04 two python scripts to be run in parallel to reproduce the issue
test-conc-read2.py ronaldoussoren, 2020-07-15 14:11
Messages (6)
msg373606 - (view) Author: Giovanni Pizzi (Giovanni Pizzi) Date: 2020-07-13 14:04
Hello,

when creating deleting (with `os.remove`/`os.unlink`) a file and opening it in a different process at (almost) the same time, on Mac OS X relatively often I get a file that is empty instead of either a `FileNotFoundError` exception, or an open handle with the original content, i.e. at least one of the two operations (unlinking or opening) seems to be non-atomic.
(With empty I mean that if after I open the file (mode 'rb'), then `fhandle.read()` returns me b''.

More in particular, after quite some debugging I noticed that what happens is that if I stat the file descriptor, `st_ino` is zero.

This can reproduced very easily.
I set up a GitHub repository here: https://github.com/giovannipizzi/concurrent-delete-race-problem
with simple examples and tests (and both a Python implementation and a C implementation).
There are also GitHub Actions that run on Mac, Ubuntu and Windows, and apparently the problem exists only on Mac (and happens on all versions).

For completeness I attach also here a tar.gz with the two very short python files that just need be run in parallel - in my Mac OS X (macOS 10.14.6, 15" 2016 MacBook Pro, python 3.6) I get the error essentially at every run. 

Important: while much harder to reproduce, I can get the same error and behaviour also with the (I think equivalent) C code. 
Therefore, this seems to be more a problem with the Mac OS standard libraries?

My first question is: has anybody ever seen this problem? It seems to me quite easy to reproduce, but I'm surprised that I couldn't find any reference in the internet even after searching for various hours (maybe I used the wrong keywords?)

Second question: should this be reported directly to Apple?

Third question: Even if this is a bug, would it make sense to implement a patch in python that raises some exception if st_ino is zero when opening a file? Or am I simplifying too much and in some conditions st_ino=0 is valid on some types of mounted filesystems? Is there some other way to have a more atomic behaviour for this race condition in python?

Thanks a lot!
msg373670 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-07-15 07:15
If it is reproducable in C this is likely a bug in macOS, and should be reported to them. 

I get the same behaviour as you with the test script, even if I add calls to fhandle.flush and os.fsync to force the content to stable storage.

Interestingly enough I sometimes get other behaviour that shouldn't happen:

Traceback (most recent call last):
  File "/Users/ronald/Downloads/concurrency-tests/test-conc-read.py", line 33, in <module>
    fhandle.flush()
OSError: [Errno 22] Invalid argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "concurrency-tests/test-conc-read.py", line 34, in <module>
    os.fsync(fhandle.fileno())
OSError: [Errno 22] Invalid argument


This is with the change I wrote about earlier:

    except FileNotFoundError:
        print(i, ">> DELETED <<")
        # Recreate the file
        time.sleep(0.001) # Wait a bit
        with open(dest_fname, 'wb') as fhandle:
            fhandle.write(b'CONTENT')
            fhandle.flush()
            os.fsync(fhandle.fileno())

This shouldn't happen, "fhandle" is open at this point and hence the low-level file descriptor should still be valid.
msg373671 - (view) Author: Giovanni Pizzi (Giovanni Pizzi) Date: 2020-07-15 08:06
Thanks for your feedback!

I also just reported the issue to Apple. For reference I got the Feedback ID FB8009608.

I will let you know if they answer.

In any case I agree that also your other error is unexpected.
I am writing a library that should provide atomicity guarantees and this issue is creating me a lot of headaches :-)
msg373696 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-07-15 14:11
I've attached a variation on test-conc-read.py that uses os.open and low-level APIs instead of the Python IO stack (open/io.open).  This fails in the same way.

os.open/os.read/os.write/os.close are very thin wrappers around the corresponding C APIs. That clearly indicates that there is a bug in the OS.
msg373700 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-07-15 15:21
I was able to easily reproduce it as well. I suppose we all tested on very fast SSD-based APFS file systems.  I wonder if the problem might be file system related. I wasn’t able to reproduce it using a HDD-based HFS+ file system but it might just be due to the speed difference. In any case, I agree it’s up to Apple to resolve. Let us know when you hear something back from Apple.
msg373709 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2020-07-15 18:56
This appears to be a bug in the APFS filesystem implementation.

On my machine:
- main disk (SSD, APFS): race condition happens
- r/w disk image with APFS: race condition happens
- r/w disk image with HFS+: no race condition
History
Date User Action Args
2020-07-15 18:56:12ronaldoussorensetmessages: + msg373709
2020-07-15 15:21:54ned.deilysetstatus: open -> closed
resolution: third party
messages: + msg373700

stage: resolved
2020-07-15 14:11:50ronaldoussorensetfiles: + test-conc-read2.py

messages: + msg373696
2020-07-15 08:06:56Giovanni Pizzisetmessages: + msg373671
2020-07-15 07:15:50ronaldoussorensetmessages: + msg373670
2020-07-13 14:04:01Giovanni Pizzicreate