This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Pathlib PureWindowsPath sorting incorrect (is not natural sort)
Type: behavior Stage:
Components: Library (Lib), Windows Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: eryksun, paul.moore, steve.dower, tegavu, tim.golden, zach.ware
Priority: normal Keywords:

Created on 2020-02-03 17:53 by tegavu, last changed 2022-04-11 14:59 by admin.

Messages (2)
msg361315 - (view) Author: (tegavu) Date: 2020-02-03 17:53
Wrong behavior in pathlib.PureWindowsPath - sorting does not use natural sort.

Everything below was written based on W7x64 & Python 3.8.1 (tags/v3.8.1:1b293b6, Dec 18 2019, 23:11:46) [MSC v.1916 64 bit (AMD64)] on win32.

The documentation (https://docs.python.org/3/library/pathlib.html#general-properties) states: "Paths of a same flavour are comparable and orderable."

This can be done like this:

from pathlib import *
print( PureWindowsPath('C:\\1') < PureWindowsPath('C:\\a') )

This returns True. This is expected because 1 is sorted before a on Windows.

This sorting also works well for harder cases where other sorting functions fail: !1 should be before 1 and !a should be before a.

But it fails with natural sorting:

from pathlib import *
print( PureWindowsPath('C:\\15') < PureWindowsPath('C:\\100') )

This returns False.

This is a bug in my opinion, since PureWindowsPath should sort like Windows(Explorer) would sort. 

Right now PureWindowsPath does probably something like NTFS sorting, but NTFS is not Windows and from a function called 'WindowsPath' I expect a path that would be given in Windows Explorer.

In case a simple `dir` on Windows sorts by NTFS names (I am not sure!), PureWindowsPath also fails, since (for example) "[" < "a" should be False.

See this image for comparison:
https://i.imgur.com/GjBhWsS.png

Here is a string that can be used directly as a list to check sorting:

test_list = ['15', '100', '11111', '!', '#', '$', '%', '&', "'", '(', ')', '+', '+11111', '+aaaaa', ',', '-', ';', '=', '@', '[', ']', '^', '_', '`', 'aaaaa', 'foo0', 'foo_0', '{', '}', '~', '§', '°', '´', 'µ', '€']
msg361320 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2020-02-03 18:45
> Right now PureWindowsPath does probably something like NTFS sorting, 
> but NTFS is not Windows and from a function called 'WindowsPath' I 
> expect a path that would be given in Windows Explorer.

NTFS stores the names in a directory as a btree that's sorted case-insensitively according to the filesystem's casing table. Other filesystems such as FAT32 store names in arbitrary order, maybe FIFO order with reuse of slot indexes when files are deleted, or maybe based on hashing the filename.

The Windows file API does not sort the results of a directory listing. It's up to applications to decide how a listing will be presented. You cite what Explorer does as a standard for "Windows", but there is no such standard that I know of. Maybe implementing a natural sort for Path instances is desirable, but I disagree that appealing to what Explorer does should be the sole basis for this decision. Anyway, it would be a breaking change, which certainly cannot be implemented in 3.8.

Currently sorting is based on the case-folded parts:

     def casefold_parts(self, parts):
        return [p.lower() for p in parts]
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83725
2020-02-03 18:45:29eryksunsetnosy: + paul.moore, tim.golden, zach.ware, steve.dower
components: + Library (Lib), Windows
2020-02-03 18:45:11eryksunsetnosy: + eryksun

messages: + msg361320
versions: + Python 3.9, - Python 3.8
2020-02-03 17:53:35tegavucreate