classification
Title: On Windows, os.scandir will keep a handle on the directory until the iterator is exhausted
Type: behavior Stage:
Components: Documentation, Windows Versions: Python 3.5
process
Status: open Resolution:
Dependencies: 25994 Superseder:
Assigned To: docs@python Nosy List: benhoyt, docs@python, eryksun, martin.panter, paul.moore, remyroy, serhiy.storchaka, steve.dower, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2016-01-14 18:45 by remyroy, last changed 2016-01-14 23:13 by remyroy.

Messages (8)
msg258212 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 18:45
On Windows, os.scandir will keep a handle on the directory being scanned until the iterator is exhausted. This behavior can cause various problems if try to use some filesystem calls like os.chmod or os.remove on the directory while the handle is still being kept.

There are some use cases where the iterator is not going to be exhausted like looking for a specific entry in a directory and breaking from the loop prematurely.

This behavior should at least be documented.  Alternatively, it might be interesting to provide a way prematurely end the scan without having to exhaust it and close the handle.

As a workaround, you can force the exhaustion after you are done with the iterator with something like:

for entry in iterator:
    pass

This is going to affect os.walk as well since it uses os.scandir .

The original github issue can be found on https://github.com/benhoyt/scandir/issues/58 .
msg258219 - (view) Author: Eryk Sun (eryksun) * Date: 2016-01-14 20:50
If you own the only reference you can also delete the reference, which deallocates the iterator and closes the handle.

Can you provide concrete examples where os.remove and os.chmod fail? At least in Windows 7 and 10 the directory handle is opened with the normal read and write sharing, but also with delete sharing. This sharing mode is fairly close to POSIX behavior (an important distinction is noted below). I get the following results in Windows 10:

    >>> import os, stat
    >>> os.mkdir('test')
    >>> f = open('test/file1', 'w'); f.close()
    >>> f = open('test/file2', 'w'); f.close()
    >>> it = os.scandir('test')
    >>> next(it)
    <DirEntry 'file1'>

rename, chmod, and rmdir operations succeed:

    >>> os.rename('test', 'spam')
    >>> os.chmod('spam', stat.S_IREAD)
    >>> os.chmod('spam', stat.S_IWRITE)
    >>> os.remove('spam/file1')
    >>> os.remove('spam/file2')
    >>> os.rmdir('spam')

Apparently cached entries can be an issue, but this caching is up to WinAPI FindNextFile and the system call NtQueryDirectoryFile:

    >>> next(it)
    <DirEntry 'file2'>

An important distinction is that a deleted file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed. Also, once the delete disposition is set, no *new* handles can be created for the existing file or directory (all access is denied), and a new file or directory with same name cannot be created.

    >>> os.listdir('spam')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [WinError 5] Access is denied: 'spam'

    >>> f = open('spam', 'w')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    PermissionError: [Errno 13] Permission denied: 'spam'

If we had another handle we could use that to rename "spam" to get it out of the way, at least. Without that, AFAIK, all we can do is deallocate the iterator or wait for it to be exhausted, which closes the handle and thus allows Windows to finally unlink "spam":

    >>> next(it)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration

Creating a new file named "spam" is allowed now:

    >>> f = open('spam', 'w')
    >>> f.close()
msg258225 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 21:35
Remy, is this the same problem described in Issue 25994? There a close() method (like on generators) and/or context manager support is proposed for the scandir() iterator. Perhaps we can keep this issue open for adding a warning to the documentation, and the other issue can be for improving the API in 3.6.
msg258226 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 21:46
I believe Eryk's explanation on how a file in Windows doesn't actually get unlinked until all handles and kernel pointer references are closed is spot on about the problem I had.

I had a complex example that could probably have been simplified to what Eryk posted.

That behavior on Windows is quite counterintuitive. I'm not sure about what can be done to help it.
msg258228 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 21:54
This issue is not same as Issue 25994 but it is quite related. Some kind of close() method and/or context manager support could help here as well.
msg258235 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-14 22:22
Can you explain how it is different? The way I see it, both problems are about the scandir() iterator holding an open reference (file descriptor or handle) to a directory/folder, when the iterator was not exhausted, but the caller no longer needs it.
msg258236 - (view) Author: Eryk Sun (eryksun) * Date: 2016-01-14 22:24
> That behavior on Windows is quite counterintuitive.

It's counter-intuitive from a POSIX point of view, in which anonymous files are allowed. In contrast, Windows allows any existing reference to unset the delete disposition, so the name cannot be unlinked until all references are closed.
msg258248 - (view) Author: Remy Roy (remyroy) Date: 2016-01-14 23:13
From my point of view, Issue 25994 is about the potential file descriptor/handle leaks and this issue is about being unable to perform some filesystem calls because of a hidden unclosed file descriptor/handle.

I am not going to protest if you want to treat them as the same issue.
History
Date User Action Args
2016-01-14 23:13:07remyroysetmessages: + msg258248
2016-01-14 22:24:52eryksunsetmessages: + msg258236
2016-01-14 22:22:09martin.pantersetmessages: + msg258235
2016-01-14 22:11:40serhiy.storchakasetnosy: + serhiy.storchaka
2016-01-14 21:54:30remyroysetmessages: + msg258228
2016-01-14 21:46:39remyroysetmessages: + msg258226
2016-01-14 21:35:09martin.pantersetnosy: + martin.panter, docs@python
messages: + msg258225

assignee: docs@python
dependencies: + File descriptor leaks in os.scandir()
components: + Documentation
2016-01-14 20:50:33eryksunsetnosy: + eryksun
messages: + msg258219
2016-01-14 18:51:44benhoytsetnosy: + benhoyt
2016-01-14 18:45:09remyroycreate