This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sogom
Recipients paul.moore, sogom, steve.dower, tim.golden, zach.ware
Date 2020-12-16.12:44:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1608122666.52.0.137113409131.issue42658@roundup.psfhosted.org>
In-reply-to
Content
On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm sign) are distinguished. In fact, two distinct files "\u03A9.txt" and "\u2126.txt" can exist side by side in the same folder. But os.path.normcase() transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega).

MSDN reads they use CompareStringOrdinal() to compare NTFS file names: https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally . This document also says "the function maps case using the operating system *uppercasing* table." But I made an experiment and found that at least in the Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() and then wcscmp the two" always gives the same result as "compare the two strings with CompareStringOrdinal()". Though this fact is not explicitly mentioned in MSDN https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used.

Therefore, I believe that os.path.normcase() should probably call LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second argument LCMAP_LOWERCASE.
History
Date User Action Args
2020-12-16 12:44:26sogomsetrecipients: + sogom, paul.moore, tim.golden, zach.ware, steve.dower
2020-12-16 12:44:26sogomsetmessageid: <1608122666.52.0.137113409131.issue42658@roundup.psfhosted.org>
2020-12-16 12:44:26sogomlinkissue42658 messages
2020-12-16 12:44:26sogomcreate