classification
Title: pathlib glob case sensitivity issue on Windows
Type: behavior Stage: resolved
Components: Library (Lib), Windows Versions: Python 3.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: paul.moore, pitrou, serhiy.storchaka, steve.dower, tim.golden, udo.eberhardt, zach.ware
Priority: normal Keywords:

Created on 2016-03-28 13:24 by udo.eberhardt, last changed 2016-03-31 10:03 by SilentGhost. This issue is now closed.

Messages (5)
msg262570 - (view) Author: Udo Eberhardt (udo.eberhardt) Date: 2016-03-28 13:24
On Windows Path.glob does not always return the file name with correct case. 

If the current directory contains a file named MixedCase.txt then the following script:

import pathlib
p = pathlib.Path('.')
print(list(p.glob('*.txt')))
print(list(p.glob('Mixedcase.txt')))

yields:
[WindowsPath('MixedCase.txt')]
[WindowsPath('mixedcase.txt')]

Problem: The result of the second call to glob should be 'MixedCase.txt' as well. I would expect that glob returns a file name exactly as it is spelled in the file system.
msg262574 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-28 16:19
The problem is that there is no way to just read a file name exactly as it is spelled in the file system. Iterating all names in the directory and finding the one that match specified name ignoring case is not such effective as checking that specified file name exists.
msg262593 - (view) Author: Udo Eberhardt (udo.eberhardt) Date: 2016-03-29 08:02
So this is a trade-off between consistent behavior and efficiency. My point of view is that glob is for enumerating matching files and it should consistently return the real file names. Typically glob will be called with a pattern like '*.txt' and it will have to iterate names anyway, right? In the special case that it is called with a literal name it could do the same to produce consistent results. A user who wants to check (more efficiently) if a literal name exists, can use Path.exists().

The statement in the doc could be: 
Note: To find the literal names in the file system, glob always enumerates files and directories. To check more efficiently whether a specific file exists, use exists().
msg262664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-30 13:05
For now glob(r'c:\very\long\case\insensitive\path\*.txt') have to iterate names only in one directory. For restoring actual path case it have to iterate all parent directories: r'c:\very\long\case\insensitive\path', r'c:\very\long\case\insensitive', r'c:\very\long\case', r'c:\very\long', r'c:\very', and 'c:\\'.
msg262668 - (view) Author: Udo Eberhardt (udo.eberhardt) Date: 2016-03-30 14:19
Meanwhile I realized this problem as well. There is no easy solution to determine exact spelling of the entire path. So it seems there is no simple solution to my problem. The concept of treating file system paths case-insensitive (as Windows does) seems to be a bad idea.
History
Date User Action Args
2016-03-31 10:03:41SilentGhostsetstatus: open -> closed
resolution: wont fix
stage: resolved
2016-03-30 14:19:56udo.eberhardtsetmessages: + msg262668
2016-03-30 13:05:59serhiy.storchakasetmessages: + msg262664
2016-03-29 08:02:24udo.eberhardtsetmessages: + msg262593
2016-03-28 16:19:01serhiy.storchakasetnosy: + pitrou, serhiy.storchaka
messages: + msg262574
2016-03-28 13:24:02udo.eberhardtcreate