Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of file descriptor in os.scandir() #70184

Closed
serhiy-storchaka opened this issue Jan 2, 2016 · 11 comments
Closed

Add support of file descriptor in os.scandir() #70184

serhiy-storchaka opened this issue Jan 2, 2016 · 11 comments
Assignees
Labels
3.7 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 25996
Nosy @vstinner, @benhoyt, @serhiy-storchaka, @eryksun
PRs
  • bpo-25996: Added support of file descriptors in os.scandir() on Unix. #502
  • Dependencies
  • bpo-28586: Convert os.scandir to Argument Clinic
  • Files
  • os-scandir-fd.patch
  • os-scandir-fd-2.patch
  • os-scandir-fd-3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-03-30.06:21:37.999>
    created_at = <Date 2016-01-02.18:10:38.466>
    labels = ['extension-modules', 'type-feature', '3.7']
    title = 'Add support of file descriptor in os.scandir()'
    updated_at = <Date 2017-03-30.06:21:37.998>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-03-30.06:21:37.998>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-03-30.06:21:37.999>
    closer = 'serhiy.storchaka'
    components = ['Extension Modules']
    creation = <Date 2016-01-02.18:10:38.466>
    creator = 'serhiy.storchaka'
    dependencies = ['28586']
    files = ['45377', '45462', '45554']
    hgrepos = []
    issue_num = 25996
    keywords = ['patch']
    message_count = 11.0
    messages = ['257353', '257380', '257382', '280177', '280663', '281251', '289079', '289080', '289153', '289485', '290820']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'benhoyt', 'abacabadabacaba', 'serhiy.storchaka', 'eryksun']
    pr_nums = ['502']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue25996'
    versions = ['Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    For now os.scandir() on Unix is implemented using opendir()/readdir()/closedir(). It accepts bytes and str pathname. But most functions in the os module that accept a pathname, accept also an open file descriptor. It is possible to implement this feature in scandir() with using fdopendir() instead of opendir(). This would allow to add a support of the dir_fd parameter in scandir(). And that would allow to implement os.fwalk() with scandir() and make more efficient implementation of os.walk() (because we no longer need to walk long path for deep directories, see bpo-15200).

    @serhiy-storchaka serhiy-storchaka added extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Jan 2, 2016
    @vstinner
    Copy link
    Member

    vstinner commented Jan 2, 2016

    Yeah, it was discussed when the PEP-471 was designed, but it was already hard to design os.scandir() without supporting fd as os.scandir() parameter.

    It's more complex because we have to handle the lifetime of the file descriptor especially if it's exposed in a public attribute.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 2, 2016

    Supporting file descriptor was also discussed when pathlib.Path was designed, but there was similar questions on the lifetime of the file descriptor. (Who is able to close it? When? Is it ok to close it using os.close? etc.)

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 2, 2016
    @serhiy-storchaka
    Copy link
    Member Author

    Proposed patch adds support for file descriptors in os.scandir() and implements os.fwalk() with os.scandir().

    The effect of using os.scandir() in os.fwalk():

    $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib"))'
    1 loop, best of 5: 934 msec per loop
    
    $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.walk("/usr/lib", topdown=False))'
    1 loop, best of 5: 718 msec per loop
    
    $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib"))'
    Unpatched: 1 loops, best of 5: 1.78 sec per loop
    Patched:   1 loop, best of 5: 934 msec per loop
    
    $ ./python -m timeit -n1 -r5 -s 'import os' -- 'list(os.fwalk("/usr/lib", topdown=False))'
    Unpatched: 1 loops, best of 5: 1.76 sec per loop
    Patched:   1 loop, best of 5: 947 msec per loop

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 6, 2016
    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for the review Josh. Updated patch addresses your comments and adds yet few microoptimizations.

    @serhiy-storchaka
    Copy link
    Member Author

    Resolved conflicts in the documentation.

    @serhiy-storchaka
    Copy link
    Member Author

    I'm wondering is it possible to implement this feature on Windows?

    @vstinner
    Copy link
    Member

    vstinner commented Mar 6, 2017

    I'm wondering is it possible to implement this feature on Windows?

    On Windows, scandir() is implemented with FindFirstFile() which takes strings. This function creates a handle which should then be passed to FindNextFile(). There is no similar function taking a directory handle, so it's not possible to implement os.scandir(fd) on Windows.

    It seems like the gnulib emulates fdopendir() on Windows, and its documentation contains warnings:
    https://www.gnu.org/software/gnulib/manual/html_node/fdopendir.html
    "But the replacement function is not safe to be used in libraries and is not multithread-safe. Also, the replacement does not guarantee that ‘dirfd(fdopendir(n))==n’ (dirfd might fail, or return a different file descriptor than n)."

    @eryksun
    Copy link
    Contributor

    eryksun commented Mar 7, 2017

    There is no similar function taking a directory handle

    In 3.5+ the CRT has O_OBTAIN_DIR (0x2000) for opening a directory, i.e. to call CreateFile with backup semantics. A directory can be read via GetFileInformationByHandleEx 1 using the information classes FileIdBothDirectoryRestartInfo and FileIdBothDirectoryInfo. This info class is just a simplified wrapper around the more powerful system call NtQueryDirectoryFile 2.

    The implementation details could be hidden behind _Py_opendir, _Py_fdopendir, _Py_readdir, and _Py_closedir -- allowing a common implementation of the high-level listdir() and scandir() functions. I wrote a ctypes prototype of listdir() along these lines.

    One feature that's lost in using GetFileInformationByHandleEx to list a directory is the ability to do wildcard filtering. However, Python listdir and scandir never uses wildcard filtering, so it's no real loss. FindFirstFile implements this feature via the FileName parameter of NtQueryDirectoryFile. First it translates DOS wildcards to NT's set of 5 wildcards. There's the native NT '*' and '?', plus the quirky semantics of MS-DOS via '<', '>', and '"', i.e. DOS_STAR, DOS_QM, and DOS_DOT. See FsRtlIsNameInExpression 3 for a description of these wildcard characters.

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your investigation Eryk. Helpful as always.

    Since I have no access to Windows I left this feature Unix-only.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset ea720fe by Serhiy Storchaka in branch 'master':
    bpo-25996: Added support of file descriptors in os.scandir() on Unix. (#502)
    ea720fe

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants