Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, os.stat() can fail if called while another process is creating or deleting the file #90941

Open
anntzer mannequin opened this issue Feb 18, 2022 · 10 comments
Labels
3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows type-bug An unexpected behavior, bug, or error

Comments

@anntzer
Copy link
Mannequin

anntzer mannequin commented Feb 18, 2022

BPO 46785
Nosy @pfmoore, @tjguk, @zware, @eryksun, @zooba, @itaisteinherz
PRs
  • bpo-46785: Fix race condition between os.stat() and unlink #31858
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2022-02-18.08:50:37.444>
    labels = ['3.10', 'type-bug', '3.9', 'OS-windows', '3.11']
    title = 'On Windows, os.stat() can fail if called while another process is creating or deleting the file'
    updated_at = <Date 2022-03-14.18:14:11.385>
    user = 'https://github.com/anntzer'

    bugs.python.org fields:

    activity = <Date 2022-03-14.18:14:11.385>
    actor = 'steve.dower'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Windows']
    creation = <Date 2022-02-18.08:50:37.444>
    creator = 'Antony.Lee'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 46785
    keywords = ['patch']
    message_count = 10.0
    messages = ['413468', '413476', '414820', '414927', '415088', '415096', '415102', '415167', '415176', '415177']
    nosy_count = 6.0
    nosy_names = ['paul.moore', 'tim.golden', 'zach.ware', 'eryksun', 'steve.dower', 'itaisteinherz']
    pr_nums = ['31858']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue46785'
    versions = ['Python 3.9', 'Python 3.10', 'Python 3.11']

    @anntzer
    Copy link
    Mannequin Author

    anntzer mannequin commented Feb 18, 2022

    In a first Python process, repeatedly create and delete a file:

    from pathlib import Path
    while True:
        Path("foo").touch(); Path("foo").unlink()

    In another process, repeatedly check for the path's existence:

    from pathlib import Path
    while True: print(Path("foo").exists())

    On Linux, the second process prints a random series of True and False. On Windows, it quickly fails after a few dozen iterations (likely machine-dependent) with

    PermissionError: [WinError 5] Access is denied: 'foo'

    which is actually raised by the stat() call.

    I would suggest that this is not really desirable behavior?

    @anntzer anntzer mannequin added 3.10 only security fixes OS-windows labels Feb 18, 2022
    @eryksun
    Copy link
    Contributor

    eryksun commented Feb 18, 2022

    Windows filesystems disallow new opens for a file that has its delete disposition set (i.e. the file is marked for deletion). For example, CreateFileW() fails with ERROR_ACCESS_DENIED (5) in this case. A file with its delete disposition set is still visibly linked in its parent directory.

    The trigger to unlink a file that's marked for deletion depends on the delete mode. In Windows 10, DeleteFileW() uses a POSIX delete if it's supported by the filesystem (e.g. NTFS). This delete mode unlinks the file as soon the file object that was used to set the delete disposition is closed. There's still a small window of time, however, in which attempts to open the file will fail with an access-denied error.

    In os.stat(), if CreateFileW() fails with access denied, FindFirstFileW() is called to try to get the stat data from the file's parent directory. But in this case it's likely that the file has been unlinked from the directory by the time FindFirstFileW() is called.

    The original error code from CreateFileW() gets restored if FindFirstFileW() fails. This is generally the right thing to do. However, if FindFirstFileW() fails with ERROR_FILE_NOT_FOUND (2) or ERROR_PATH_NOT_FOUND (3), then I suggest that the previous error code should not be restored.

    For example:

            switch (error) {
            case ERROR_ACCESS_DENIED:     /* Cannot sync or read attributes. */
            case ERROR_SHARING_VIOLATION: /* It's a paging file. */
                /* Try reading the parent directory. */
                if (!attributes_from_dir(path, &fileInfo, &tagInfo.ReparseTag)) {
                    /* Cannot read the parent directory. */
                    DWORD dir_error = GetLastError();
                    if (dir_error != ERROR_FILE_NOT_FOUND &&
                          dir_error != ERROR_PATH_NOT_FOUND) {
                        SetLastError(error);
                    }
                    return -1;
                }

    @eryksun eryksun added 3.9 only security fixes 3.11 only security fixes type-bug An unexpected behavior, bug, or error labels Feb 18, 2022
    @itaisteinherz
    Copy link
    Mannequin

    itaisteinherz mannequin commented Mar 9, 2022

    I'd like to work on this, however I'm not sure how this could be unit-tested. Any ideas?

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 11, 2022

    Itai, you can add a test to Win32NtTests in Lib/test/test_os.py. Maybe spawn a child process that creates and unlinks a file in a loop. In the parent process execute a loop that tries to stat the file and ignores errors when the file or path isn't found. For example:

        @support.requires_subprocess()
        def test_stat_unlink_race(self):
            # bpo-46785: the implementation of os.stat() falls back on reading
            # the parent directory if CreateFileW() fails with a permission
            # error. If reading the parent directory fails because the file or
            # directory is subsequently unlinked or because the volume or
            # share is no longer available, then the original permission error
            # should not be restored.
            fname = os.path.join(os.environ['TEMP'], os_helper.TESTFN + '_46785')
            self.addCleanup(os_helper.unlink, fname)
            command = '''if 1:
                import os
                import sys
                fname = sys.argv[1]
                while True:
                    try:
                        with open(fname, "w") as f:
                            pass
                    except OSError:
                        pass
                    try:
                        os.remove(fname)
                    except OSError:
                        pass
            '''
            ignored_errors = (
                2,  # ERROR_FILE_NOT_FOUND
                3,  # ERROR_PATH_NOT_FOUND
                21, # ERROR_NOT_READY
                67, # ERROR_BAD_NET_NAME
            )
            deadline = time.time() + 5
            p = subprocess.Popen([sys.executable, '-c', command, fname])
            try:
                while time.time() < deadline:
                    try:
                        os.stat(fname)
                    except OSError as e:
                        if e.winerror not in ignored_errors:
                            raise
            finally:
                p.terminate()

    As the above test shows, I think the error should also be kept if attributes_from_dir() fails with ERROR_NOT_READY or ERROR_BAD_NET_NAME. For example:

            switch (error) {
            case ERROR_ACCESS_DENIED:     /* Cannot sync or read attributes. */
            case ERROR_SHARING_VIOLATION: /* It's a paging file. */
                /* Try reading the parent directory. */
                if (!attributes_from_dir(path, &fileInfo, &tagInfo.ReparseTag)) {
                    /* Cannot read the parent directory. */
                    switch (GetLastError()) {
                    // keep these error codes
                    case ERROR_FILE_NOT_FOUND:
                    case ERROR_PATH_NOT_FOUND:
                    case ERROR_NOT_READY:
                    case ERROR_BAD_NET_NAME:
                        break;
                    // restore the error from CreateFileW()
                    default:
                        SetLastError(error);
                    }
                    return -1;
                }

    @itaisteinherz
    Copy link
    Mannequin

    itaisteinherz mannequin commented Mar 13, 2022

    Thanks for the comprehensive reply, Eryk!

    I have a few questions regarding your suggestion:

    1. Why catch ERROR_NOT_READY and ERROR_BAD_NET_NAME as well? How do you know that FindFirstFileW() may return them?
    2. Why can't the filename of the "foo"-like file in the test be simply os_helper.TESTFN, as done in some other tests?
    3. I noticed some code snippets used in tests are wrapped in a seemingly-redundant if 1: ... statement. Why is that?

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 13, 2022

    Why catch ERROR_NOT_READY and ERROR_BAD_NET_NAME as well?

    When os.stat() falls back on FindFirstFileW(), an error that means the file doesn't exist should be kept. ERROR_BAD_NET_NAME is an obvious error to keep because it's already mapped to ENOENT (i.e. file not found). This error typically means that share was removed or the network went down. ERROR_NOT_READY typically means that the media was removed from a volume (e.g. ejected disk). pathlib.Path.exists() returns False for ERROR_NOT_READY.

    Why can't the filename of the "foo"-like file in the test be
    simply os_helper.TESTFN, as done in some other tests?

    I suppose the current working directory will be fine. I was looking to keep the test on a NTFS filesystem, with known behavior, but there's no hard guarantee that the user's temp directory is on the system volume.

    I noticed some code snippets used in tests are wrapped in a
    seemingly-redundant if 1: ... statement. Why is that?

    The if 1 test is logically redundant but convenient for using a multiline string that has an arbitrary indentation level. For example:

        >>> exec('''
        ...         print(42)
        ... ''')
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "<string>", line 2
            print(42)
        IndentationError: unexpected indent
    
        >>> exec('''if 1:
        ...         print(42)
        ... ''')
        42

    @itaisteinherz
    Copy link
    Mannequin

    itaisteinherz mannequin commented Mar 13, 2022

    Interesting, thanks again :)

    @zooba
    Copy link
    Member

    zooba commented Mar 14, 2022

    > Why can't the filename of the "foo"-like file in the test be
    > simply os_helper.TESTFN, as done in some other tests?

    I suppose the current working directory will be fine. I was looking to keep the test on a NTFS filesystem, with known behavior, but there's no hard guarantee that the user's temp directory is on the system volume.

    All tests should use the current working directory, or the test helper
    for getting other directories. *Do not look up other directories*,
    because it prevents test runners from ensuring that tests run in valid
    locations (and skips implicit tests for directories with spaces/Unicode
    characters/etc.)

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 14, 2022

    I was following the pattern of StatAttributeTests.test_access_denied(), which uses the current user's temp directory to get a filesystem that supports security.

    It would probably be better to skip tests if the filesystem of the current working directory doesn't support the test, e.g. if it needs NTFS or needs support for security, hard links, or reparse points. For example, Win32JunctionTests should be skipped if reparse points aren't supported. The os_helper module could implement a function that calls GetVolumeInformationW() to get the filesystem name (e.g. "Ntfs") and flags (e.g. FILE_PERSISTENT_ACLS, FILE_SUPPORTS_HARD_LINKS, FILE_SUPPORTS_REPARSE_POINTS).

    @zooba
    Copy link
    Member

    zooba commented Mar 14, 2022

    It would probably be better to skip tests if the filesystem of the current working directory doesn't support the test,

    Yes, this would be good. Then whoever is configuring the test runner can
    move where tests are run to make sure it is supported. There are command
    line options specifically for this, that also correctly handle
    multiprocessing.

    Tests that bypass the CWD make this unfixable by the runner, which is
    why they should just use CWD.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes 3.10 only security fixes 3.11 only security fixes OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants