This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.6
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: QbLearningPython, r.david.murray, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-11-15 19:48 by QbLearningPython, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (5)
msg306304 - (view) Author: (QbLearningPython) Date: 2017-11-15 19:48
While testing a module, I have found a weird behaviour of pathlib package. I have a list of pathlib.Paths and I sorted() it. I assumed that the order retrieved by sorting a list of Paths would be the same as the order retrieved by sorting the list of their corresponding (string) filenames. But it is not the case.

I run the following example:


==========================================================================


from pathlib import Path

# order string filenames

filenames_for_testing = (
    '/spam/spams.txt',
    '/spam/spam.txt',
    '/spam/another.txt',
    '/spam/binary.bin',
    '/spam/spams/spam.ttt',
    '/spam/spams/spam01.txt',
    '/spam/spams/spam02.txt',
    '/spam/spams/spam03.ppp',
    '/spam/spams/spam04.doc',
)

sorted_filenames = sorted(filenames_for_testing)

# output ordered list of string filenames

print()
print("Ordered list of string filenames:")
print()
[print(f'\t{element}') for element in sorted_filenames]
print()

# order paths (build from same string filenames)

paths_for_testing = [
    Path(filename)
    for filename in filenames_for_testing
]
sorted_paths = sorted(paths_for_testing)

# outoput ordered list of pathlib.Paths

print()
print("Ordered list of pathlib.Paths:")
print()
[print(f'\t{element}') for element in sorted_paths]
print()

# compare

print()

if sorted_filenames == [str(path) for path in sorted_paths]:
    print('Ordered lists of string filenames and pathlib.Paths are EQUAL.')

else:
    print('Ordered lists of string filenames and pathlib.Paths are DIFFERENT.')

    for element in range(0, len(sorted_filenames)):

        if sorted_filenames[element] != str(sorted_paths[element]):

            print()
            print('First different element:')
            print(f'\tElement #{element}')
            print(f'\t{sorted_filenames[element]} != {sorted_paths[element]}')
            break

print()



==========================================================================


The output of this script was:


==========================================================================

Ordered list of string filenames:

	/spam/another.txt
	/spam/binary.bin
	/spam/spam.txt
	/spam/spams.txt
	/spam/spams/spam.ttt
	/spam/spams/spam01.txt
	/spam/spams/spam02.txt
	/spam/spams/spam03.ppp
	/spam/spams/spam04.doc


Ordered list of pathlib.Paths:

	/spam/another.txt
	/spam/binary.bin
	/spam/spam.txt
	/spam/spams/spam.ttt
	/spam/spams/spam01.txt
	/spam/spams/spam02.txt
	/spam/spams/spam03.ppp
	/spam/spams/spam04.doc
	/spam/spams.txt


Ordered lists of string filenames and pathlib.Paths are DIFFERENT.

First different element:
	Element #3
	/spam/spams.txt != /spam/spams/spam.ttt


==========================================================================


As you can see, 'spam/spams.txt' goes in different places if you have sorted by pathlib.Paths than if you have sorted by string filenames.

I think that it is weird that sorting pathlib.Paths yields a different result than sorting their string filenames. I think that pathlib.Paths should be ordered by alphabetical order of their corresponding filenames.

Thank you.
msg306306 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-11-15 20:09
Paths are ordered by lexicographical order of their corresponding components. Paths are not strings, and this this order is more natural for them.

The alphabetical order of Path strings:

    SPAMS.txt
    SPAM\file.txt

    spam\file.txt
    spams.txt

    spam-file.txt
    spam/file.txt
    spam_file.txt

The lexicographical order of Path components:

    SPAM\file.txt
    SPAMS.txt

    spam\file.txt
    spams.txt

    spam/file.txt
    spam-file.txt
    spam_file.txt
msg306367 - (view) Author: (QbLearningPython) Date: 2017-11-16 16:12
Thanks, serhiy.storchaka, for your answer.

I am not fully convinced.

You have described the current behaviour of the pathlib package.

But let me ask: should be this the desired behaviour?

Since string filenames and pathlib.Paths are different ways to refer to the same object (a path in a filesystem), should not be they behaved in the same way when sorting?

You pointed out that the current behaviour is "more natural order" for pathlib.Paths. I am not truly sure about that. Can you please provide any citation or additional information about that?

Thank you.
msg306394 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-11-16 18:40
It is "obvious by inspection".  Paths are paths instead of strings because they are formed out of discrete path components instead of strings.  If you sorted each directory in the paths from the top down, and then sorted the subdirectories, and then sorted the filenames, you get that sorting by component.  It's the same order you would get out of an ls -R.
msg306395 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-11-16 18:41
To put it another way, think about sorting a list of tuples.  Same behavior.
History
Date User Action Args
2022-04-11 14:58:54adminsetgithub: 76221
2017-11-16 18:41:02r.david.murraysetmessages: + msg306395
2017-11-16 18:40:03r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg306394

stage: resolved
2017-11-16 16:12:43QbLearningPythonsetmessages: + msg306367
2017-11-15 20:09:37serhiy.storchakasetresolution: not a bug

messages: + msg306306
nosy: + serhiy.storchaka
2017-11-15 19:48:55QbLearningPythoncreate