classification
Title: Add support of file descriptor in os.scandir()
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: 28586 Superseder:
Assigned To: serhiy.storchaka Nosy List: abacabadabacaba, benhoyt, eryksun, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016-01-02 18:10 by serhiy.storchaka, last changed 2017-03-30 06:21 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
os-scandir-fd.patch serhiy.storchaka, 2016-11-06 21:53 review
os-scandir-fd-2.patch serhiy.storchaka, 2016-11-12 17:28 review
os-scandir-fd-3.patch serhiy.storchaka, 2016-11-20 06:35 review
Pull Requests
URL Status Linked Edit
PR 502 merged serhiy.storchaka, 2017-03-06 09:08
Messages (11)
msg257353 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-01-02 18:10
For now os.scandir() on Unix is implemented using opendir()/readdir()/closedir(). It accepts bytes and str pathname. But most functions in the os module that accept a pathname, accept also an open file descriptor. It is possible to implement this feature in scandir() with using fdopendir() instead of opendir(). This would allow to add a support of the dir_fd parameter in scandir(). And that would allow to implement os.fwalk() with scandir() and make more efficient implementation of os.walk() (because we no longer need to walk long path for deep directories, see issue15200).
msg257380 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-02 22:38
Yeah, it was discussed when the PEP 471 was designed, but it was already hard to design os.scandir() without supporting fd as os.scandir() parameter.

It's more complex because we have to handle the lifetime of the file descriptor especially if it's exposed in a public attribute.
msg257382 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-01-02 22:42
Supporting file descriptor was also discussed when pathlib.Path was designed, but there was similar questions on the lifetime of the file descriptor. (Who is able to close it? When? Is it ok to close it using os.close? etc.)
msg280177 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-06 21:53
Proposed patch adds support for file descriptors in os.scandir() and implements os.fwalk() with os.scandir().

The effect of using os.scandir() in os.fwalk():

$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib"))'
1 loop, best of 5: 934 msec per loop

$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib", topdown=False))'
1 loop, best of 5: 718 msec per loop

$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib"))'
Unpatched: 1 loops, best of 5: 1.78 sec per loop
Patched:   1 loop, best of 5: 934 msec per loop

$ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib", topdown=False))'
Unpatched: 1 loops, best of 5: 1.76 sec per loop
Patched:   1 loop, best of 5: 947 msec per loop
msg280663 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-12 17:28
Thank you for the review Josh. Updated patch addresses your comments and adds yet few microoptimizations.
msg281251 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-20 06:35
Resolved conflicts in the documentation.
msg289079 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-06 09:10
I'm wondering is it possible to implement this feature on Windows?
msg289080 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-06 09:21
> I'm wondering is it possible to implement this feature on Windows?

On Windows, scandir() is implemented with FindFirstFile() which takes strings. This function creates a handle which should then be passed to FindNextFile(). There is no similar function taking a directory handle, so it's not possible to implement os.scandir(fd) on Windows.

It seems like the gnulib emulates fdopendir() on Windows, and its documentation contains warnings:
https://www.gnu.org/software/gnulib/manual/html_node/fdopendir.html
"But the replacement function is not safe to be used in libraries and is not multithread-safe. Also, the replacement does not guarantee that ‘dirfd(fdopendir(n))==n’ (dirfd might fail, or return a different file descriptor than n)."
msg289153 - (view) Author: Eryk Sun (eryksun) * Date: 2017-03-07 06:26
> There is no similar function taking a directory handle

In 3.5+ the CRT has O_OBTAIN_DIR (0x2000) for opening a directory, i.e. to call CreateFile with backup semantics. A directory can be read via GetFileInformationByHandleEx [1] using the information classes FileIdBothDirectoryRestartInfo and FileIdBothDirectoryInfo. This info class is just a simplified wrapper around the more powerful system call NtQueryDirectoryFile [2]. 

The implementation details could be hidden behind _Py_opendir, _Py_fdopendir, _Py_readdir, and _Py_closedir -- allowing a common implementation of the high-level listdir() and scandir() functions. I wrote a ctypes prototype of listdir() along these lines.

One feature that's lost in using GetFileInformationByHandleEx to list a directory is the ability to do wildcard filtering. However, Python listdir and scandir never uses wildcard filtering, so it's no real loss. FindFirstFile implements this feature via the FileName parameter of NtQueryDirectoryFile. First it translates DOS wildcards to NT's set of 5 wildcards. There's the native NT '*' and '?', plus the quirky semantics of MS-DOS via '<', '>', and '"', i.e. DOS_STAR, DOS_QM, and DOS_DOT. See FsRtlIsNameInExpression [3] for a description of these wildcard characters. 

[1]: https://msdn.microsoft.com/en-us/library/aa364953
[2]: https://msdn.microsoft.com/en-us/library/ff567047
[3]: https://msdn.microsoft.com/en-us/library/ff546850
msg289485 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 08:15
Thank you for your investigation Eryk. Helpful as always.

Since I have no access to Windows I left this feature Unix-only.
msg290820 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 06:12
New changeset ea720fe7e99d68924deab38de955fe97f87e2b29 by Serhiy Storchaka in branch 'master':
bpo-25996: Added support of file descriptors in os.scandir() on Unix. (#502)
https://github.com/python/cpython/commit/ea720fe7e99d68924deab38de955fe97f87e2b29
History
Date User Action Args
2017-03-30 06:21:37serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2017-03-30 06:12:33serhiy.storchakasetmessages: + msg290820
2017-03-12 08:15:07serhiy.storchakasetmessages: + msg289485
2017-03-07 06:26:47eryksunsetnosy: + eryksun
messages: + msg289153
2017-03-06 09:21:26vstinnersetmessages: + msg289080
2017-03-06 09:10:38serhiy.storchakasetmessages: + msg289079
2017-03-06 09:08:38serhiy.storchakasetpull_requests: + pull_request410
2016-11-20 06:35:46serhiy.storchakasetfiles: + os-scandir-fd-3.patch

messages: + msg281251
2016-11-12 17:28:36serhiy.storchakasetfiles: + os-scandir-fd-2.patch

messages: + msg280663
2016-11-06 21:53:54serhiy.storchakasetfiles: + os-scandir-fd.patch
versions: + Python 3.7, - Python 3.6
messages: + msg280177

keywords: + patch
stage: patch review
2016-11-02 08:34:13serhiy.storchakasetassignee: serhiy.storchaka
dependencies: + Convert os.scandir to Argument Clinic
2016-10-31 23:53:40serhiy.storchakalinkissue28564 dependencies
2016-05-22 18:01:13abacabadabacabasetnosy: + abacabadabacaba
2016-01-02 22:42:01vstinnersetmessages: + msg257382
2016-01-02 22:38:59vstinnersetmessages: + msg257380
2016-01-02 18:10:38serhiy.storchakacreate