Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: os.walk now using os.scandir() breaks bytes filenames on windows #70099

Closed
mont29 mannequin opened this issue Dec 19, 2015 · 37 comments
Closed

Regression: os.walk now using os.scandir() breaks bytes filenames on windows #70099

mont29 mannequin opened this issue Dec 19, 2015 · 37 comments
Labels
OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mont29
Copy link
Mannequin

mont29 mannequin commented Dec 19, 2015

BPO 25911
Nosy @pfmoore, @db3l, @vstinner, @tjguk, @ideasman42, @benhoyt, @zware, @serhiy-storchaka, @eryksun, @zooba
PRs
  • [3.4] Backport CI config from master #2475
  • Files
  • walk_bytes.patch
  • walk_bytes_2.patch
  • test_walk_bytes_deprecate.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-03-29.21:57:59.389>
    created_at = <Date 2015-12-19.15:39:38.268>
    labels = ['type-bug', 'library', 'OS-windows']
    title = 'Regression: os.walk now using os.scandir() breaks bytes filenames on windows'
    updated_at = <Date 2017-06-28.17:13:22.562>
    user = 'https://bugs.python.org/mont29'

    bugs.python.org fields:

    activity = <Date 2017-06-28.17:13:22.562>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-03-29.21:57:59.389>
    closer = 'vstinner'
    components = ['Library (Lib)', 'Windows']
    creation = <Date 2015-12-19.15:39:38.268>
    creator = 'mont29'
    dependencies = []
    files = ['41397', '41598', '41897']
    hgrepos = []
    issue_num = 25911
    keywords = ['patch', '3.5regression']
    message_count = 37.0
    messages = ['256731', '256732', '256755', '256757', '256785', '256787', '256804', '256812', '256820', '256904', '256908', '256909', '256911', '258133', '258134', '258378', '259846', '259847', '259848', '260103', '260787', '261360', '261370', '261735', '262174', '262330', '262332', '262333', '262474', '262475', '262485', '262594', '262599', '262605', '262606', '262607', '262632']
    nosy_count = 13.0
    nosy_names = ['paul.moore', 'db3l', 'vstinner', 'tim.golden', 'ideasman42', 'SilentGhost', 'benhoyt', 'python-dev', 'zach.ware', 'serhiy.storchaka', 'eryksun', 'steve.dower', 'mont29']
    pr_nums = ['2475']
    priority = 'high'
    resolution = 'fixed'
    stage = 'needs patch'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25911'
    versions = ['Python 3.5', 'Python 3.6']

    @mont29
    Copy link
    Mannequin Author

    mont29 mannequin commented Dec 19, 2015

    In py3.4 and below we used to be able to use bytes filenames with os.walk(), it’s now impossible under windows due to the limitation of os.scandir().

    This issue was reported to Blender tracker (https://developer.blender.org/T47018).

    I assume this should be considered as a regression? At least, this should be mentioned in the documentation…

    Thanks,
    Bastien

    @mont29 mont29 mannequin added stdlib Python modules in the Lib dir OS-windows labels Dec 19, 2015
    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Dec 19, 2015

    Perhaps, I'm looking in the wrong place, but there doesn't seem be any test for bytes filenames.

    @SilentGhost SilentGhost mannequin added the type-bug An unexpected behavior, bug, or error label Dec 19, 2015
    @vstinner
    Copy link
    Member

    Regressions are not cool, but it was a deliberate choice to not support
    bytes filenames in os.scandir(). If you use bytes, you can get filenames
    which are invalid: you will be unable to use the filename with open() to
    read its content for example.

    Bytes filename are deprecated since python 3.2 if i recall correctly. It's
    time to use the right time, it's also simpler to use on Python 3.

    I suggest to document the regression rather than adding bytes support to
    os.scandir or don't use scandir in os.walk().

    @serhiy-storchaka
    Copy link
    Member

    For softer transition we can return old os.walk() implementation for bytes (may be with a deprecation warning). Using os.scandir() is just an optimization, and this shouldn't break existing program.

    @ideasman42
    Copy link
    Mannequin

    ideasman42 mannequin commented Dec 21, 2015

    @Haypo, I checked available info online and couldn't find any reference to byte file-paths were deprecated since Python3.2.

    On the contrary, the 3.2 release notes 0 state:

    "countless fixes regarding bytes/string issues; among them full support for a bytes environment (filenames, environment variables)"

    ----
    Also docs for open 3.2 1 and 3.5 2 say that byte filenames are supported with no mention of deprecation.

    Since this is already working properly in 3.5 for other systems besides ms-windows, and worked in 3.4x.
    Dropping support on a single platform seems a rather problematic regression.

    ----

    @eryksun
    Copy link
    Contributor

    eryksun commented Dec 21, 2015

    The ANSI API is problematic because it returns a best-fit encoding to the system codepage. For example:

        >>> os.listdir('.')
        ['ƠƨưƸǀLjǐǘǠǨǰǸ']
    
        >>> os.listdir(b'.')
        [b'O?u?|?iu?Kj?']

    To somewhat work around this problem, listdir and scandir could return the cAlternateFilename of the WIN32_FIND_DATA struct if present. This is the classic 8.3 short name that Microsoft file systems create for MS-DOS compatibility. For NTFS it can be disabled in the registry, or per volume, but I assume whoever does that knows what to expect.

    Also, since Python wouldn't need the short name for a wide-character path, there's no point in asking for it. (For NTFS it's a separate entry in the MFT. If it exists, which is known ahead of time, finding the entry requires a second lookup.) In this case it's better to call FindFirstFileExW and request only FindExInfoBasic. Generally the difference is inconsequential, but in a contrived example with 10000 similarly-named files from "ĀāĂă0000" to "ĀāĂă9999" and short names from "0000~1" to "9999~1", skipping the short name lookup shaved about 10% off the total time. For this test, I replaced the FindFirstFileW call in posix_scandir with the following call:

    iterator->handle = FindFirstFileExW(path_strW,
                                        FindExInfoBasic,
                                        &iterator->file_data,
                                        FindExSearchNameMatch,
                                        NULL, 0);
    

    @zware
    Copy link
    Member

    zware commented Dec 21, 2015

    I like Serhiy's suggestion to use the old walk implementation on Windows if the argument is bytes, with a DeprecationWarning (like all other bytes operations in os on Windows). There was no warning that bytes paths would stop working with os.walk, so we should fix it and I'm bumping up the priority.

    bytes filenames have been deprecated since Python 3.3 (bpo-13374); I don't think scandir (new in 3.5) should support bytes.

    @vstinner
    Copy link
    Member

    Python 3.5.0 and 3.5.1 are out. I don't think that it's worth to add back
    bytes support is os.walk() in 3.5.2. I would prefer to do the opposite and
    really drop bytes support for all filenames in the whole stdlib in 3.6.

    It's really a bad idea to use bytes on Windows. With Python 3, it becomes
    possible and simple to handle filenames as Unicode on all platforms and in
    all cases (see PEP-393 for undecodable filenames).

    @benhoyt
    Copy link
    Mannequin

    benhoyt mannequin commented Dec 22, 2015

    Just to clarify what Victor and Zachary mentioned: bytes filenames have been deprecated only *on Windows* since Python 3.3; bytes filenames are still fine on POSIX.

    @serhiy-storchaka
    Copy link
    Member

    Here is a patch that restores the ability of os.walk() to work with bytes filenames on Windows.

    @zooba
    Copy link
    Member

    zooba commented Dec 23, 2015

    Did we want a deprecation warning too? I don't see it in the patch.

    Otherwise, LGTM.

    @eryksun
    Copy link
    Contributor

    eryksun commented Dec 23, 2015

    Considering there's no plan to implement bytes paths for scandir on Windows, then the following line in the os docs needs to be modified: "All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if a path or file name is returned." It should be changed to acknowledge that some functions on Windows do not support bytes paths, and that existing support is deprecated and may be removed in a future release.

    Also, the docs for listdir should warn that using bytes returns invalid results on Windows when filenames contain characters that aren't mapped in the system locale codepage.

    @serhiy-storchaka
    Copy link
    Member

    There is open bpo-16700 for the documentation.

    @serhiy-storchaka
    Copy link
    Member

    Fixed a typo found by Eryk Sun.

    @serhiy-storchaka
    Copy link
    Member

    Did we want a deprecation warning too?

    I didn't tested on Windows. If bytes paths considered deprecated on Windows, os.listdir() should emit a warning. Otherwise I don't think we should special case os.walk().

    @ideasman42
    Copy link
    Mannequin

    ideasman42 mannequin commented Jan 16, 2016

    A related question, (realize this isn't a support forum), but what is the equivalent API calls in Python3.5 so we can update scripts.

    Should we be doing: os.walk(os.fsdecode(path)) ?

    From Python's documentation, os.fsdecode is using 'strict' handler which may raise exceptions.
    I'm mainly concerned that changes here will introduce bugs where bytes previously resolved the paths (albeit working in an imperfect way).

    If this is not the best place to ask about this, then can post elsewhere.

    @serhiy-storchaka serhiy-storchaka self-assigned this Feb 8, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 8, 2016

    New changeset 5310f94772f4 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25911: Restored support of bytes paths in os.walk() on Windows.
    https://hg.python.org/cpython/rev/5310f94772f4

    New changeset b060af2a58b6 by Serhiy Storchaka in branch 'default':
    Issue bpo-25911: Restored support of bytes paths in os.walk() on Windows.
    https://hg.python.org/cpython/rev/b060af2a58b6

    @vstinner
    Copy link
    Member

    vstinner commented Feb 8, 2016

    New changeset 5310f94772f4 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25911: Restored support of bytes paths in os.walk() on Windows.
    https://hg.python.org/cpython/rev/5310f94772f4

    New changeset b060af2a58b6 by Serhiy Storchaka in branch 'default':
    Issue bpo-25911: Restored support of bytes paths in os.walk() on Windows.
    https://hg.python.org/cpython/rev/b060af2a58b6

    While this change is ok for Python 3.5, I would prefer to *drop* support for bytes filenames on Windows. I started a thread on python-dev for that:
    https://mail.python.org/pipermail/python-dev/2016-February/143150.html

    @vstinner
    Copy link
    Member

    vstinner commented Feb 8, 2016

    Bastien:

    In py3.4 and below we used to be able to use bytes filenames with os.walk(), it’s now impossible under windows due to the limitation of os.scandir().

    This issue was reported to Blender tracker (https://developer.blender.org/T47018).

    Again, why do you use bytes? Blender only supports Python 3. You must use Unicode on all platforms. On Windows, it gives you support of the full Unicode character set for free.

    @serhiy-storchaka
    Copy link
    Member

    New tests emit deprecation warnings on Windows and failed.

    http://buildbot.python.org/all/builders/AMD64%20Windows8%203.x/builds/1753/steps/test/logs/stdio
    test_rmdir_on_directory_link_to_missing_target (test.test_os.Win32SymlinkTests) ... D:\buildarea\3.x.bolen-windows8\build\lib\os.py:447: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    for name in listdir(dir):
    D:\buildarea\3.x.bolen-windows8\build\lib\os.py:441: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    return path.isdir(self.path)
    D:\buildarea\3.x.bolen-windows8\build\lib\ntpath.py:249: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    st = os.lstat(path)
    D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py:2688: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    func(name, *func_args)
    D:\buildarea\3.x.bolen-windows8\build\lib\unittest\case.py:176: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    callable_obj(*args, **kwargs)
    D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py:1881: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    sorted(os.listdir(path)),
    D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py:1867: DeprecationWarning: The Windows bytes API has been deprecated, use Unicode filenames instead
    sorted(os.listdir(os.fsencode(support.TESTFN))),
    test test_os failed
    skipped 'currently fails; consider for improvement'

    ======================================================================
    FAIL: test_walk_bottom_up (test.test_os.BytesWalkTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py", line 894, in test_walk_bottom_up
        self.sub2_tree)
    AssertionError: Tuples differ: ('@test_4312_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3']) != ('@test_4312_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])

    First differing element 1:
    ['broken_link', 'link']
    ['link']

    • ('@test_4312_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3'])
      ? ----------

    + ('@test_4312_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])
    ? ++++++++++

    ======================================================================
    FAIL: test_walk_prune (test.test_os.BytesWalkTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py", line 874, in test_walk_prune
        self.assertEqual(all[1], self.sub2_tree)
    AssertionError: Tuples differ: ('@test_4312_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3']) != ('@test_4312_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])

    First differing element 1:
    ['broken_link', 'link']
    ['link']

    • ('@test_4312_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3'])
      ? ----------

    + ('@test_4312_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])
    ? ++++++++++

    Proposed patch should silence warnings. Could anyone please test it on Windows?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 24, 2016

    New changeset 9bffe39e8273 by Serhiy Storchaka in branch '3.5':
    Fixed a bug in os.walk() with bytes path on Windows caused by merging fixes
    https://hg.python.org/cpython/rev/9bffe39e8273

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 8, 2016

    New changeset 19a3e0e664af by Serhiy Storchaka in branch '3.4':
    Issues bpo-23808, bpo-25911: Trying to fix walk tests on Windows.
    https://hg.python.org/cpython/rev/19a3e0e664af

    New changeset f9e22717722d by Serhiy Storchaka in branch '3.5':
    Issues bpo-23808, bpo-25911: Trying to fix walk tests on Windows.
    https://hg.python.org/cpython/rev/f9e22717722d

    New changeset da020e408c7f by Serhiy Storchaka in branch 'default':
    Issues bpo-23808, bpo-25911: Trying to fix walk tests on Windows.
    https://hg.python.org/cpython/rev/da020e408c7f

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 8, 2016

    New changeset 757159fb4847 by Serhiy Storchaka in branch '3.5':
    Issue bpo-25911: Tring to silence deprecation warnings in bytes path walk tests.
    https://hg.python.org/cpython/rev/757159fb4847

    New changeset ecf4e51c222f by Serhiy Storchaka in branch 'default':
    Issue bpo-25911: Tring to silence deprecation warnings in bytes path walk tests.
    https://hg.python.org/cpython/rev/ecf4e51c222f

    @serhiy-storchaka
    Copy link
    Member

    Deprecation warnings are suppressed, but my attempt to fix test failures failed.

    http://buildbot.python.org/all/builders/AMD64%20Windows8%203.x/builds/1832/steps/test/logs/stdio
    ======================================================================
    FAIL: test_walk_bottom_up (test.test_os.BytesWalkTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py", line 894, in test_walk_bottom_up
        self.sub2_tree)
    AssertionError: Tuples differ: ('@test_4060_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3']) != ('@test_4060_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])

    First differing element 1:
    ['broken_link', 'link']
    ['link']

    • ('@test_4060_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3'])
      ? ----------

    + ('@test_4060_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])
    ? ++++++++++

    ======================================================================
    FAIL: test_walk_prune (test.test_os.BytesWalkTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py", line 874, in test_walk_prune
        self.assertEqual(all[1], self.sub2_tree)
    AssertionError: Tuples differ: ('@test_4060_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3']) != ('@test_4060_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])

    First differing element 1:
    ['broken_link', 'link']
    ['link']

    • ('@test_4060_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3'])
      ? ----------

    + ('@test_4060_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])
    ? ++++++++++

    ----------------------------------------------------------------------

    In some cases a broken link is considered as a link to regular file, but in other cases is considered as a link to directory.

    It is hard to fix this Windows issue without Windows. Needed a help of Windows experts.

    @serhiy-storchaka serhiy-storchaka removed their assignment Mar 14, 2016
    @vstinner
    Copy link
    Member

    Windows buildbots still fail (sometimes?) because of this issue :-/

    http://buildbot.python.org/all/builders/AMD64%20Windows8%203.x/builds/1874/steps/test/logs/stdio

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 24, 2016

    New changeset d54ee39b061f by Victor Stinner in branch 'default':
    Fix DeprecationWarning on Windows
    https://hg.python.org/cpython/rev/d54ee39b061f

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 24, 2016

    New changeset eb91d0387d59 by Victor Stinner in branch 'default':
    Enhance os._DummyDirEntry
    https://hg.python.org/cpython/rev/eb91d0387d59

    @vstinner
    Copy link
    Member

    I faile to reproduce the bug by running test_os alone on Windows 8.

    I wrote a script to try to isolate a sequential list of tests which reproduce the bug on Windows, but I failed to find such list after 2 hours:
    https://bitbucket.org/haypo/misc/src/3e2e6138798c6cec061544ff116e50a8d1d6d2b8/python/isolate.py?fileviewer=file-view-default

    So I'm unable to reproduce the bug, I tried a change to enhance os._DummyDirEntry, it should now mimick better the C implementation of os.DirEntry. Let's see if the test still fails randomly...

    @vstinner
    Copy link
    Member

    Update: test_os pass on all Windows buildbots except one!

    I don't understand why the test fails on this buildbot.

    http://buildbot.python.org/all/builders/AMD64%20Windows8%203.x/builds/1906/steps/test/logs/stdio

    ======================================================================
    FAIL: test_walk_bottom_up (test.test_os.BytesWalkTests)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "D:\buildarea\3.x.bolen-windows8\build\lib\test\test_os.py", line 895, in test_walk_bottom_up
        self.assertEqual(len(all), 4)
    AssertionError: 5 != 4

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 26, 2016

    New changeset 82da02b5bf22 by Victor Stinner in branch 'default':
    Issue bpo-25911: more info on test_os failure
    https://hg.python.org/cpython/rev/82da02b5bf22

    @serhiy-storchaka
    Copy link
    Member

    Some tests were failed on Windows 8 even in 3.4 (bpo-23808).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 29, 2016

    New changeset 2b25fa7e3b7a by Victor Stinner in branch 'default':
    Fix os._DummyDirEntry.is_symlink()
    https://hg.python.org/cpython/rev/2b25fa7e3b7a

    @db3l
    Copy link
    Contributor

    db3l commented Mar 29, 2016

    I'm including some comments here from an email thread I had with Victor about this issue on the Win8 buildbot, which led to his recent changeset 2b25fa7e3b7a.

    The Win8 3.x failure (the 5 != 4 length error) was due to the revision of os._DummyDirEntry (introduced I believe in eb91d0387d59) using stat() rather than lstat() to check for a symbolic link, which couldn't work, which in turn failed to exclude the symbolic directory link from the walk in the test.

    The 3.5 branch failure about the mis-matched tuple values from the walk is a separate issue, where the broken directory link is still counted as a directory (rather than a file). It seems due to os.path.isdir() - in the older os._DummyDirEntry implementation - returning True for the broken link in the test, rather than False as on Unix. That's because on Unix the underlying os.stat call fails, but on Windows isdir is replaced with nt._isdir which avoids os.stat but is just a shim for. The success on Windows sort of makes sense since the target_is_directory parameter to os.symlink() was used to create the broken link in the test, so Windows knows it's a directory, even if the link target is missing.

    If the bytes walk implementation needs to treat broken links like files on Windows (matching the non-bytes version), then it can't depend on isdir() failing. The non-bytes version (using scandir()) works even on Windows since internally scandir appears to build up mode bits out of os.stat/os.lstat calls and never uses isdir.

    Back-porting the 3.x implementation of DummyDirEntry would be one quick way to fix the 3.5 issue as it also avoids isdir().

    BTW, with respect to changeset 2b25fa7e3b7a, I'm not sure it has quite the right semantics for is_dir, at least not if it's supposed to parallel os.path.isdir(). I believe that should return True for a symbolic link to a directory, which it doesn't look like this change would, if is_symlink happened to have been called first. It's possible the semantics of how _DummyDirEntry is used precludes that scenario, but it seems a little fragile. I'd probably just use lstat() for is_symlink, but otherwise not touch is_dir.

    Finally, as to why this only showed up on the Win8 buildbot - it turns out that's the only machine where the tests could create a symbolic link and thus encounter the scenario - they were prevented on Win7 and Win10 under the normal buildbot user due to lack of privileges (which was a surprise to me). That's also what was happening on Victor's local Win8 VM. So the failing conditions were just being skipped. The Win8 buildbot was itself sort of pure luck, as early on I must have set it up to run under an elevated (administrative) command prompt, probably due to trying to handle some other issue. I've now done the same thing for Win10 so it began seeing the same failures in the last few tests.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 29, 2016

    New changeset c38ac7ab8d9a by Victor Stinner in branch '3.5':
    Issue bpo-25911: Backport os._DummyDirEntry fixes
    https://hg.python.org/cpython/rev/c38ac7ab8d9a

    @vstinner
    Copy link
    Member

    BTW, with respect to changeset 2b25fa7e3b7a, I'm not sure it has quite the right semantics for is_dir, at least not if it's supposed to parallel os.path.isdir(). I believe that should return True for a symbolic link to a directory, which it doesn't look like this change would, if is_symlink happened to have been called first. It's possible the semantics of how _DummyDirEntry is used precludes that scenario, but it seems a little fragile. I'd probably just use lstat() for is_symlink, but otherwise not touch is_dir.

    I think that you misunderstood the code. The "use cache lstat" path is only taken if the file is *not* symbolic link.

    I tested manually _DummyDirEntry on a symbolic link to a directory: _lstat is filled by the constructor, is_dir() *and* is_symlink() returns True. Both are not exclusive, is_dir() works as follow_symlinks=True, whereas is_symlink() works as follow_symlinks=False.

    @vstinner
    Copy link
    Member

    Since all test_os now pass again on 3.x buildbots (after a few months of red buildbots), I backported the _DummyDirEntry fixes to 3.5.

    David:

    The 3.5 branch failure about the mis-matched tuple values from the walk is a separate issue, (...)

    I'm sorry, I don't understand you. The test_os failure was the same on Python 3.5 and 3.6:

    • ('@test_4312_tmp\\TEST1\\SUB2', ['broken_link', 'link'], ['tmp3'])
      ? ----------
      + ('@test_4312_tmp\\TEST1\\SUB2', ['link'], ['broken_link', 'tmp3'])
      ? ++++++++++

    This error should now be fixed on Python 3.5 and 3.6.

    Is there a remaining open issue related to os.scandir() (bytes or Unicode)?

    @vstinner
    Copy link
    Member

    The issue looks to be fixed in Python 3.5 (future version 3.5.2) and 3.6, I close the issue.

    I repeat myself: On Python 3, don't store filenames as bytes, it's a bad practice. Use unicode, it works on all platforms and gives access to the full Unicode Character Set!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants