classification
Title: glob returns results in undeterministic order
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.3, Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: bmwiedemann, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2017-05-24 19:29 by bmwiedemann, last changed 2017-06-04 17:23 by rhettinger. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1794 closed bmwiedemann, 2017-05-24 19:30
Messages (4)
msg294381 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-05-24 19:29
because POSIX readdir does not guarantee any order
glob often gives unexpectedly random results.

Some background:
for openSUSE Linux we build packages in the Open Build Service (OBS)
which tracks dependencies, so when e.g. a new glibc is submitted,
all packages depending on glibc are rebuilt
and if those depending binaries changed,
the new version is pushed to the mirrors.

Many python modules build their .so files from a glob.glob(path, "*.cpp")

The old glob behaviour would often lead to the linker
randomly ordering functions in resulting object files,
thus we were not able to auto-detect
that the package did not actually change
which wastes bandwidth of distribution mirrors and users.

See also https://reproducible-builds.org/ on that topic.

There are plenty affected packages out there
https://github.com/pytries/datrie/blob/master/setup.py#L10
https://github.com/jonashaag/bjoern/blob/master/setup.py#L6
https://github.com/scipy/scipy/blob/master/scipy/sparse/linalg/dsolve/setup.py#L28
msg294391 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-24 20:53
This looks as a duplicate of issue21748. That behavior was explicitly documented in issue25615.
msg295117 - (view) Author: Bernhard M. Wiedemann (bmwiedemann) * Date: 2017-06-04 09:02
From my performance measurements, the overhead was negligible (not even counting the processing done on files returned by glob).

And also glob in C, bash, perl all do sort by default and these are generally pretty fast languages, yet they still chose consistency over performance.

I updated my PR to also update the documentation accordingly.
msg295133 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-06-04 17:23
Sorry, we're going to reject this patch for the reasons discussed in the two other referenced patches.

If a user wants sorted order, they can effortlessly specify that with sorted(glob('*.cpp')).
History
Date User Action Args
2017-06-04 17:23:33rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg295133

resolution: not a bug -> rejected
2017-06-04 09:02:38bmwiedemannsetstatus: closed -> open

messages: + msg295117
2017-05-25 00:56:05rhettingersetstatus: open -> closed
resolution: not a bug
stage: resolved
2017-05-24 20:53:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg294391
2017-05-24 19:30:16bmwiedemannsetpull_requests: + pull_request1877
2017-05-24 19:29:31bmwiedemanncreate