This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author eryksun
Recipients SilentGhost, benhoyt, eryksun, ideasman42, mont29, paul.moore, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Date 2015-12-21.09:31:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1450690307.84.0.8477742602.issue25911@psf.upfronthosting.co.za>
In-reply-to
Content
The ANSI API is problematic because it returns a best-fit encoding to the system codepage. For example:

    >>> os.listdir('.')
    ['ƠƨưƸǀLjǐǘǠǨǰǸ']

    >>> os.listdir(b'.')
    [b'O?u?|?iu?Kj?']

To somewhat work around this problem, listdir and scandir could return the cAlternateFilename of the WIN32_FIND_DATA struct if present. This is the classic 8.3 short name that Microsoft file systems create for MS-DOS compatibility. For NTFS it can be disabled in the registry, or per volume, but I assume whoever does that knows what to expect. 

Also, since Python wouldn't need the short name for a wide-character path, there's no point in asking for it. (For NTFS it's a separate entry in the MFT. If it exists, which is known ahead of time, finding the entry requires a second lookup.) In this case it's better to call FindFirstFileExW and request only FindExInfoBasic. Generally the difference is inconsequential, but in a contrived example with 10000 similarly-named files from "ĀāĂă0000" to "ĀāĂă9999" and short names from "0000~1" to "9999~1", skipping the short name lookup shaved about 10% off the total time. For this test, I replaced the FindFirstFileW call in posix_scandir with the following call:

    iterator->handle = FindFirstFileExW(path_strW,
                                        FindExInfoBasic,
                                        &iterator->file_data,
                                        FindExSearchNameMatch,
                                        NULL, 0);
History
Date User Action Args
2015-12-21 09:31:47eryksunsetrecipients: + eryksun, paul.moore, vstinner, tim.golden, ideasman42, SilentGhost, benhoyt, zach.ware, serhiy.storchaka, steve.dower, mont29
2015-12-21 09:31:47eryksunsetmessageid: <1450690307.84.0.8477742602.issue25911@psf.upfronthosting.co.za>
2015-12-21 09:31:47eryksunlinkissue25911 messages
2015-12-21 09:31:47eryksuncreate