This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Allow os.remove to defer to rmdir
Type: enhancement Stage: needs patch
Components: Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benrg, bup, eryksun
Priority: normal Keywords:

Created on 2022-02-18 17:55 by bup, last changed 2022-04-11 14:59 by admin.

Messages (5)
msg413499 - (view) Author: Dan Snider (bup) * Date: 2022-02-18 17:55
It appears sometime recently-ish that POSIX updated remove to the following:

    #include <stdio.h>
    int remove(const char *path);

If path does not name a directory, remove(path) shall be equivalent to unlink(path). If path names a directory, remove(path) shall be equivalent to rmdir(path).
msg413545 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-02-19 12:05
In Windows, checking for a directory in order to defer to either os_rmdir_impl() or os_unlink_impl() would lead to a redundant directory check. os_unlink_impl() already has to check for a directory in order to delete a directory symlink or mountpoint. I suggest implementing a common _Py_remove() function in Windows. For example:

    typedef enum {
        REMOVE_FILE,
        REMOVE_DIR,
        REMOVE_BOTH,
    } _Py_remove_mode;

    int
    _Py_remove(path_t *path, _Py_remove_mode mode)
    {
        BOOL isDir = FALSE;
        BOOL isLnk = FALSE;
        BOOL success = FALSE;

        if (mode != REMOVE_DIR) {
            DWORD fileAttributes = GetFileAttributesW(path->wide);
            if (fileAttributes != INVALID_FILE_ATTRIBUTES) {
                isDir = fileAttributes & FILE_ATTRIBUTE_DIRECTORY;
            }

            if (isDir && (mode == REMOVE_FILE)) {
                WIN32_FIND_DATAW data;
                HANDLE hFind = FindFirstFileW(path->wide, &data);
                if (hFind != INVALID_HANDLE_VALUE) {
                    FindClose(hFind);
                    if (data.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
                        isLnk = IsReparseTagNameSurrogate(data.dwReserved0);
                    }
                }
            }
        }

        if (mode == REMOVE_DIR || isDir && (mode == REMOVE_BOTH || isLnk)) {
            success = RemoveDirectoryW(path->wide);
        } else {
            success = DeleteFileW(path->wide);
        }
        
        return success ? 0 : -1;
    }

The multiple opens for GetFileAttributesW(), FindFirstFileW() and DeleteFileW() add up to make the delete more expensive and more race prone than it has to be. It would be nice to use a single open for all operations. But the Windows API just has CreateFileW(), which requires SYNCHRONIZE access. The latter can't be granted by the parent directory, unlike FILE_READ_ATTRIBUTES and DELETE access. We could implement a version that uses CreateFileW(), and fall back on the above version when access is denied, similar to what we do for os.stat(). Also, Python is limited to the Windows 8.1 SDK, which makes it awkward to use FileDispositionInfoEx (POSIX delete) like DeleteFileW() does in Windows 10, but it should still be possible.
msg414220 - (view) Author: (benrg) Date: 2022-02-28 19:59
The REMOVE_DIR case reduces to

    return RemoveDirectoryW(path->wide) ? 0 : -1;

so I think there's no reason to combine it with the other two.

The REMOVE_BOTH case is

    attrs = GetFileAttributesW(path->wide);

    if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY)) {
        success = RemoveDirectoryW(path->wide);
    } else {
        success = DeleteFileW(path->wide);
    }

    return success ? 0 : -1;

For REMOVE_BOTH, I don't see the need of calling GetFileAttributes - couldn't you just try DeleteFile, and if that fails, RemoveDirectory?
msg414227 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-03-01 00:05
> For REMOVE_BOTH, I don't see the need of calling GetFileAttributes

I was thinking that the NtQueryAttributesFile() system call is relatively cheap compared to a full open, especially if the attributes of a remote file are cached locally. However, on second thought, it's probably not. An open can fail as soon as it knows that there's a file/directory type mismatch. This should be able to take advantage of a local attribute cache instead of incurring network latency.

> so I think there's no reason to combine it with the other two.

REMOVE_DIR can be separate, for the current behavior. But I wanted all modes handled in one function in case later on we decide to fix os.rmdir() in Windows. It allows deleting a directory symlink. Note that the os.lstat() result reports a directory symlink as an S_IFLNK file, not S_IFDIR, so the os.rmdir() behavior is internally inconsistent. This could be corrected by forcing the REMOVE_DIR mode to raise NotADirectoryError. For example:

        } else { // mode != REMOVE_BOTH

            WIN32_FIND_DATAW data;
            BOOL isDir = FALSE;
            BOOL isLnk = FALSE;

            HANDLE hFind = FindFirstFileW(path->wide, &data);
            if (hFind != INVALID_HANDLE_VALUE) {
                FindClose(hFind);
                isDir = data.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY;
                if (data.dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT) {
                    if (mode == REMOVE_DIR) {
                        isLnk = data.dwReserved0 == IO_REPARSE_TAG_SYMLINK;
                    } else {
                        isLnk = IsReparseTagNameSurrogate(data.dwReserved0);
                    }
                }
            }

            if ((mode == REMOVE_DIR) && (isDir && isLnk)) {
                SetLastError(ERROR_DIRECTORY); // POSIX ENOTDIR
            } else if ((mode == REMOVE_DIR) || (isDir && isLnk)) {
                success = RemoveDirectoryW(path->wide);
            } else {
                success = DeleteFileW(path->wide);
            }
        }
msg414252 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-03-01 09:58
glibc remove() has an optimization to skip calling rmdir() if the macro condition IS_NO_DIRECTORY_ERROR is true for the unlink() error. For Linux, this condition is `errno != EISDIR`. On other platforms (e.g. BSD systems), the condition is `errno != EPERM`. The implementation is the following, from "sysdeps/posix/remove.c":

    int
    remove (const char *file)
    {
      /* First try to unlink since this is more frequently the necessary action. */
      if (__unlink (file) != 0
          /* If it is indeed a directory...  */
          && (IS_NO_DIRECTORY_ERROR
          /* ...try to remove it.  */
          || __rmdir (file) != 0))
        /* Cannot remove the object for whatever reason.  */
        return -1;

      return 0;
    }

WinAPI DeleteFileW() doesn't support the same distinction. In this case, ERROR_ACCESS_DENIED is set for the following system status codes:

    STATUS_ACCESS_DENIED (like EACCES)
    STATUS_CANNOT_DELETE (like EPERM; readonly or image-mapped)
    STATUS_DELETE_PENDING (like EPERM)
    STATUS_FILE_IS_A_DIRECTORY (like EISDIR)

os.remove() could skip skip calling RemoveDirectoryW() for a sharing violation, i.e. ERROR_SHARING_VIOLATION. If DeleteFileW() fails with a sharing violation, the path is not a directory and, even if it were, an attempt to delete it would fail.
History
Date User Action Args
2022-04-11 14:59:56adminsetgithub: 90947
2022-03-01 09:58:03eryksunsetmessages: + msg414252
2022-03-01 00:05:50eryksunsetmessages: + msg414227
2022-02-28 19:59:50benrgsetnosy: + benrg
messages: + msg414220
2022-02-19 12:05:17eryksunsetversions: + Python 3.11
nosy: + eryksun

messages: + msg413545

stage: needs patch
2022-02-18 17:55:13bupcreate