This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benrg
Recipients benrg, paul.moore, steve.dower, tim.golden, zach.ware
Date 2018-01-21.21:04:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1516568681.86.0.467229070634.issue32612@psf.upfronthosting.co.za>
In-reply-to
Content
(Pure)WindowsPath uses str.lower to fold paths for comparison and hashing. This doesn't match the case folding of actual Windows file systems. There exist WindowsPath objects that compare and hash equal, but refer to different files. For example, the strings

  '\xdf' (sharp S) and '\u1e9e' (capital sharp S)
  '\u01c7' (LJ) and '\u01c8' (Lj)
  '\u0130' (I with dot) and 'i\u0307' (i followed by combining dot)
  'K' and '\u212a' (Kelvin sign)

are equal under str.lower folding but are distinct file names on NTFS volumes on my Windows 7 machine. There are hundreds of other such pairs.

I think this is very bad. The reverse (paths that compare unequal but refer to the same file) is probably unavoidable and is expected by programmers. But paths that compare equal should never be unequal to the OS.

How to fix this:

Unfortunately, there is no correct way to case fold Windows paths. The FAT, NTFS, and exFAT drivers on my machine all have different behavior. (The examples above work on all three, except for 'K' and '\u212a', which are equivalent on FAT volumes.) NTFS stores its case-folding map on each volume in the hidden $UpCase file, so even different NTFS volumes on the same machine can have different behavior. The contents of $UpCase have changed over time as Windows is updated to support new Unicode versions. NTFS and NFS (and possibly WebDAV) also support full case sensitivity when used with Interix/SUA and Cygwin, though this requires disabling system-wide case insensitivity via the registry.

I think that pathlib should either give up on case folding entirely, or should fold very conservatively, treating WCHARs as equivalent only if they're equivalent on all standard file systems on all supported Windows versions.

If pathlib folds case at all, there should be a solution for people who need to interoperate with Cygwin or SUA tools on a case-sensitive machine, but I suppose they can just use PosixPath.
History
Date User Action Args
2018-01-21 21:04:41benrgsetrecipients: + benrg, paul.moore, tim.golden, zach.ware, steve.dower
2018-01-21 21:04:41benrgsetmessageid: <1516568681.86.0.467229070634.issue32612@psf.upfronthosting.co.za>
2018-01-21 21:04:41benrglinkissue32612 messages
2018-01-21 21:04:40benrgcreate