classification
Title: On Windows, os.scandir will keep a handle on the directory until the iterator is exhausted
Type: behavior Stage: resolved
Components: Documentation, Windows Versions: Python 3.5
process
Status: closed Resolution: third party
Dependencies: 25994 Superseder:
Assigned To: docs@python Nosy List: benhoyt, docs@python, eryksun, martin.panter, paul.moore, remyroy, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2016-01-14 18:45 by remyroy, last changed 2021-02-25 17:59 by steve.dower. This issue is now closed.

Messages (10)
msg258212 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 18:45
On Windows, os.scandir will keep a handle on the directory being scanned until the iterator is exhausted. This behavior can cause various problems if try to use some filesystem calls like os.chmod or os.remove on the directory while the handle is still being kept.

There are some use cases where the iterator is not going to be exhausted like looking for a specific entry in a directory and breaking from the loop prematurely.

This behavior should at least be documented.  Alternatively, it might be interesting to provide a way prematurely end the scan without having to exhaust it and close the handle.

As a workaround, you can force the exhaustion after you are done with the iterator with something like:

for entry in iterator:
    pass

This is going to affect os.walk as well since it uses os.scandir .

The original github issue can be found on https://github.com/benhoyt/scandir/issues/58 .
msg258219 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-01-14 20:50
If you own the only reference you can also delete the reference, which deallocates the iterator and closes the handle.

Can you provide concrete examples where os.remove and os.chmod fail? At least in Windows 7 and 10 the directory handle is opened with the normal read and write sharing, but also with delete sharing. This sharing mode is fairly close to POSIX behavior (an important distinction is noted below). I get the following results in Windows 10:

    >>> import os, stat
    >>> os.mkdir('test')
    >>> f = open('test/file1', 'w'); f.close()
    >>> f = open('test/file2', 'w'); f.close()
    >>> it = os.scandir('test')
    >>> next(it)
    <DirEntry 'file1'>

rename, chmod, and rmdir operations succeed:

    >>> os.rename('test', 'spam')
    >>> os.chmod('spam', stat.S_IREAD)
    >>> os.chmod('spam', stat.S_IWRITE)
    >>> os.remove('spam/file1')
    >>> os.remove('spam/file2')
    >>> os.rmdir('spam')

Apparently cached entries can be an issue, but this caching is up to WinAPI FindNextFile and the system call NtQueryDirectoryFile:

    >>> next(it)
    <DirEntry 'file2'>

An important distinction is that a deleted file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed. Also, once the delete disposition is set, no *new* handles can be created for the existing file or directory (all access is denied), and a new file or directory with same name cannot be created.

    >>> os.listdir('spam')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [WinError 5] Access is denied: 'spam'

    >>> f = open('spam', 'w')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [Errno 13] Permission denied: 'spam'

If we had another handle we could use that to rename "spam" to get it out of the way, at least. Without that, AFAIK, all we can do is deallocate the iterator or wait for it to be exhausted, which closes the handle and thus allows Windows to finally unlink "spam":

    >>> next(it)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration

Creating a new file named "spam" is allowed now:

    >>> f = open('spam', 'w')
    >>> f.close()
msg258225 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 21:35
Remy, is this the same problem described in Issue 25994? There a close() method (like on generators) and/or context manager support is proposed for the scandir() iterator. Perhaps we can keep this issue open for adding a warning to the documentation, and the other issue can be for improving the API in 3.6.
msg258226 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 21:46
I believe Eryk's explanation on how a file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed is spot on about the problem I had.

I had a complex example that could probably have been simplified to what Eryk posted.

That behavior on Windows is quite counterintuitive. I'm not sure about what can be done to help it.
msg258228 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 21:54
This issue is not same as Issue 25994 but it is quite related. Some kind of close() method and/or context manager support could help here as well.
msg258235 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 22:22
Can you explain how it is different? The way I see it, both problems are about the scandir() iterator holding an open reference (file descriptor or handle) to a directory/folder, when the iterator was not exhausted, but the caller no longer needs it.
msg258236 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2016-01-14 22:24
> That behavior on Windows is quite counterintuitive.

It's counter-intuitive from a POSIX point of view, in which anonymous files are allowed. In contrast, Windows allows any existing reference to unset the delete disposition, so the name cannot be unlinked until all references are closed.
msg258248 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 23:13
From my point of view, Issue 25994 is about the potential file descriptor/handle leaks and this issue is about being unable to perform some filesystem calls because of a hidden unclosed file descriptor/handle.

I am not going to protest if you want to treat them as the same issue.
msg387642 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2021-02-24 23:12
Issue 25994 added support for the context-manager protocol and close() method in 3.6. So it's at least much easier to ensure that the handle gets closed. 

The documentation of scandir() links to WinAPI FindFirstFile and FindNextFile, which at least mentions the "search handle". It's not made explicit that this encapsulates a handle for a kernel file object, nor are the operations (e.g. move, rename, delete) discussed that are allowed directly on the directory. Similarly, the directory stream that's returned by and used by POSIX opendir() and readdir() may or may not encapsulate a file descriptor. 

I don't think Python's documentation is the best place to discuss platform-specific implementation details in most cases. Exceptions should be made in some cases, but I don't think this is one of them because I can't even link to a document about the implementation details of FindNextFile. At a lower level I can link to documents about the NtQueryDirectoryFile[Ex] system call, but that's not much help in terms of officially documenting what FindNextFile does. Microsoft prefers to keep the Windows API details opaque, which gives them wiggle room.

FYI, in Windows 10, deleting files and directories now tries a POSIX delete (if supported by the filesystem) that immediately unlinks the name as soon as the handle that's used to perform the delete is closed, such as the handle that's opened to implement DeleteFile (os.unlink) and RemoveDirectory (os.rmdir). NTFS supports this feature by moving the file/directory to a reserved "\$Extend\$Deleted" directory:

    >>> os.mkdir('spam')
    >>> h = win32file.CreateFile('spam', 0, 0, None, 3, 0x0200_0000, None)
    >>> print(win32file.GetFinalPathNameByHandle(h, 0))
    \\?\C:\Temp\test\test\spam

    >>> os.rmdir('spam')
    >>> print(win32file.GetFinalPathNameByHandle(h, 0))
    \\?\C:\$Extend\$Deleted\001000000000949A5E2FE5BB

Of course, none of the above is documented for RemoveDirectory().
msg387683 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2021-02-25 17:59
> FYI, in Windows 10, deleting files and directories now tries a POSIX delete

Yeah, FWIW, I haven't been able to get clear guidance on what I can/cannot publicly announce we've done in this space. But since you've found it I guess I can say sorry that I couldn't announce it more loudly! :)

A number of our other issues should be able to be closed soon once the changes get out in the open.
History
Date User Action Args
2021-02-25 17:59:56steve.dowersetmessages: + msg387683
2021-02-24 23:12:36eryksunsetstatus: open -> closed
resolution: third party
messages: + msg387642

stage: resolved
2016-01-14 23:13:07remyroysetmessages: + msg258248
2016-01-14 22:24:52eryksunsetmessages: + msg258236
2016-01-14 22:22:09martin.pantersetmessages: + msg258235
2016-01-14 22:11:40serhiy.storchakasetnosy: + serhiy.storchaka
2016-01-14 21:54:30remyroysetmessages: + msg258228
2016-01-14 21:46:39remyroysetmessages: + msg258226
2016-01-14 21:35:09martin.pantersetnosy: + martin.panter, docs@python
messages: + msg258225

assignee: docs@python
dependencies: + File descriptor leaks in os.scandir()
components: + Documentation
2016-01-14 20:50:33eryksunsetnosy: + eryksun
messages: + msg258219
2016-01-14 18:51:44benhoytsetnosy: + benhoyt
2016-01-14 18:45:09remyroycreate