classification
Title: glob.glob should explicitly note that results aren't sorted
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.8, Python 3.7, Python 3.6, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Ben FrantzDale, csabella, docs@python, eryksun, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords: easy, patch

Created on 2018-04-13 18:38 by Ben FrantzDale, last changed 2018-04-25 15:47 by eryksun.

Pull Requests
URL Status Linked Edit
PR 6587 open Elena.Oat, 2018-04-24 13:34
Messages (9)
msg315254 - (view) Author: Ben FrantzDale (Ben FrantzDale) Date: 2018-04-13 18:38
The sortedness of glob.glob's output is platform-dependent. While the docs do not mention sorting, and so are strictly correct, if you are on a platform where its output is sorted, it's easy to believe that the output is always sorted.

I propose we a Note maybe next to "Note: Using the “**” pattern in large directory trees may consume an inordinate amount of time." that says "Note: While the output of glob.glob may be sorted on some architectures, ordering is not guaranteed. Use `sort(glob.glob(...))` if ordering is important."

This wrong assumption burned us when scripts inexplicably stopped working on OSX High Sierra.
msg315259 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-13 19:34
This seems reasonable.  I would like like it to be part of the regular text rather rather than appearing as a big ..note entry which can be visually distracting from the core functionality.
msg315273 - (view) Author: Eryk Sun (eryksun) * Date: 2018-04-13 23:55
> The sortedness of glob.glob's output is platform-dependent.

It's typically file-system dependent (e.g. NTFS, FAT, ISO9660, UDF) -- at least on Windows. NTFS and ISO9660 store directories in sorted order based on the filename (Unicode or ASCII ordinal sort).
msg315275 - (view) Author: Ben FrantzDale (Ben FrantzDale) Date: 2018-04-14 00:09
Fascinating. That seems like an even wilder gotcha: It sounds like a script assuming sorted results would work in one directory (on one filesystem) but not on another. Or even weirder, if I had a mounted scratch partition, the script could work until I (or a sys admin) mounts a larger drive with a different filesystem on the same mountpoint. Yikes! Either way, this gotcha seems worth mentioning explicitly.
msg315545 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-04-21 00:36
How about adding a sentence to the end of the first paragraph.

 glob.glob(pathname, *, recursive=False)

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).  Whether or not the results are sorted depends on the file system.
msg315701 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-24 13:41
Are there such notes in the descriptions of os.listdir(), os.scandir(), os.walk(), os.fwalk() and corresponding Path methods? If explicitly document the sorting, this should be made for all files enumerating functions.
msg315702 - (view) Author: Ben FrantzDale (Ben FrantzDale) Date: 2018-04-24 14:15
Great point. Looks like the phrase is "in arbitrary order" in the docs for
those (both 2.7 and 3), which is better than saying nothing. I'd still
prefer a bit more specificity about the potential gotcha since "arbitrary"
seems a lot less deterministic than "some file systems will give you sorted
order, some won't".

On Tue, Apr 24, 2018 at 9:41 AM, Serhiy Storchaka <report@bugs.python.org>
wrote:

>
> Serhiy Storchaka <storchaka+cpython@gmail.com> added the comment:
>
> Are there such notes in the descriptions of os.listdir(), os.scandir(),
> os.walk(), os.fwalk() and corresponding Path methods? If explicitly
> document the sorting, this should be made for all files enumerating
> functions.
>
> ----------
> nosy: +serhiy.storchaka
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue33275>
> _______________________________________
>
msg315710 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-04-24 17:35
I agree that anything that has the same FS-determined sorted or not behavior should get the same note, for the same reason.  Ben, can you test?  Eryk, can you enlighten us further?

PS: Ben, when responding by email, please delete the quote, as it is duplicate noise on the web page.
msg315748 - (view) Author: Eryk Sun (eryksun) * Date: 2018-04-25 15:47
As I said, some file systems such as NTFS and ISO 9660 (or Joliet) store directories in lexicographically sorted order. NTFS does this using a b-tree and case-insensitive comparison, which helps the driver efficiently implement filtering a directory listing using a pattern such as "spam*eggs?.txt". (Filtering of a directory listing at the syscall level is peculiar to Windows and not supported by Python.)

I like the phrase "arbitrary order". I don't think it's wise for an application to ever depend on the order. Also, we usually want natural-language collation for display purposes (e.g. spam2.txt should come before spam10.txt), so we have to sort the result regardless of the file system.
History
Date User Action Args
2018-04-25 15:47:39eryksunsetmessages: + msg315748
2018-04-24 17:35:31terry.reedysetmessages: + msg315710
2018-04-24 14:15:41Ben FrantzDalesetmessages: + msg315702
2018-04-24 13:41:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg315701
2018-04-24 13:34:46Elena.Oatsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request6287
2018-04-21 00:36:55terry.reedysetnosy: + terry.reedy
messages: + msg315545
2018-04-14 00:09:48Ben FrantzDalesetmessages: + msg315275
2018-04-13 23:55:52eryksunsetnosy: + eryksun
messages: + msg315273
2018-04-13 19:35:33rhettingersetnosy: + csabella
2018-04-13 19:34:47rhettingersetassignee: docs@python
components: + Documentation, - Library (Lib)
versions: - Python 3.4, Python 3.5
keywords: + easy
nosy: + rhettinger, docs@python

messages: + msg315259
stage: needs patch
2018-04-13 18:38:46Ben FrantzDalecreate