classification
Title: Add a PYTHONREVERSEDICTKEYORDER environment variable
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: haypo, inada.naoki, lamby, r.david.murray, rhettinger
Priority: normal Keywords: patch

Created on 2017-02-03 09:37 by lamby, last changed 2017-02-05 23:58 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
0001-Add-a-PYTHONREVERSEDICTKEYORDER-environment-variable.patch lamby, 2017-02-03 09:37 review
hack_dict.py haypo, 2017-02-05 23:58
Messages (16)
msg286849 - (view) Author: (lamby) Date: 2017-02-03 09:37
Due to implementation changes, since CPython 3.6 dict keys are returned
in insertion order. However, in order to test for reproducible builds [0],
it would be convenient to be able to reverse this ordering; we would then
run a build of an arbitrary package both with and without this flag and
compare the resulting binary.

(We already run such a testing framework, so specifying this environment
variable would be trivial. Note that this "reverse" would actually find
more issues than simply relying on the pre-3.6 non-deterministic
behaviour.)

This patch changes the behaviour of:

  * for x in d:
  * d.popitem()
  * d.items()
  * _PyDict_Next

 [0] https://reproducible-builds.org/
msg286850 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-03 09:40
Why don't you use OrderdDict and reversed()?
msg286851 - (view) Author: (lamby) Date: 2017-02-03 09:48
> Why don't you use OrderdDict and reversed()?

This isn't for my own code; I want to change the behaviour of CPython itself so it affects arbitrary third-party code - this is what we are testing when we are testing for reproducibility :)
msg286852 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-03 09:58
I can't understand what is the problem.
If the package produce same binary when dict keeps insertion order,
isn't it a "reproducible build"?
msg286853 - (view) Author: (lamby) Date: 2017-02-03 10:02
> If the package produce same binary when dict keeps insertion order,
> isn't it a "reproducible build"?

No, as that's a CPython-specific (and 3.6+) implementation detail. Hence "forcing" a test for it :)
msg286855 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-03 10:20
For checking compatibility with other implementation, I want to wait
until other implementation compatible with 3.6+ which doesn't
keep insertion order of dict.
For now, there are no 3.6+ compatible Python implementation except CPython.

For checking compatibility with Python 3.5-, I -1 to add such flag.
Python 3.6 has many new features.  You should use 3.5 instead.
msg286872 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-03 14:50
Inada: we haven't 100% decided that this is going to become a language feature.  However it is likely to become so, so adding such a flag is probably wasted effort.  Further, if the goal is to test compatibility with other python implementations, shouldn't you actually be testing against those other implementations?  You are likely to catch more problems than just dict order that way.  So I vote -1 on this.
msg286879 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-02-03 17:01
I concur with David and Inada on this one (it is likely to become a wasted effort and it impacts maintainability to try to support this even for the short run).
msg286886 - (view) Author: (lamby) Date: 2017-02-03 20:16
I think we are misunderstanding each other regarding our goals here :)

I'm not trying to test against other Python implementations or versions of CPython itself but rather "flush out" reproducibility issues in third-party Python code that (incorrectly) relies on dict ordering being relatively stable and/or in insertion order, etc. etc.

(The only reason I mention 3.6 is because the insertion-order behaviour there simply makes it easier to have a 'reverse' order)
msg286888 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-02-03 20:50
But that reliance/reproducibility-error would be an issue only on interpreters that don't preserve insertion order, and we're expecting we'll make that a language requirement.  So for now, or for as long as you think it is warranted, just test against interpreters that randomize the order.

Note that this is different from the pre-randomization dict behavior, where lots of programs depended on the accident-of-the-implementation order in which keys were returned.  What we think is coming is a guaranteed ordering, which is, thus, reproducible.
msg286938 - (view) Author: (lamby) Date: 2017-02-04 09:37
> we're expecting we'll make that a language requirement

Mmm, but only for (at least) 3.7+. It would still be very useful to find software that is relying on (currently) undefined behaviour, no?
msg286958 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-04 11:33
At least, ordering of namespace dict and kwargs dict are language spec for 3.6.
This option breaks it.  When this option is set, CPython 3.6 is not Python 3.6.
msg286988 - (view) Author: (lamby) Date: 2017-02-04 20:46
> ordering of namespace dict and kwargs dict are language spec for 3.6

Are they really _specced_ for 3.6? I was under the impression that it was just an implementation detail.
msg287034 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-05 02:51
see https://mail.python.org/pipermail/python-dev/2016-September/146348.html

kwargs, __duct__, and namespace passed to metaclass are ordered by language design.
order of other dicts are implementation detail.
msg287066 - (view) Author: (lamby) Date: 2017-02-05 23:38
> order of other dicts are implementation detail.

Right, exactly :)
msg287067 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-05 23:58
While the use case makes sense, test if an application relies on the dictionary iterating order, I'm not sure that adding an option to change the order.

For me, it's a rare and very specific use case, whereas your option is public and "too easy" to find and use. For example, what if a developer decides that its application now requires this option to run?

Moreover, your code changes performance critical code. I don't want to get a slowdown here for rare use case, since we spent a lot of time to optimize these functions!

I suggest you to try to implement your feature in a dict subtype in a third party module, and try to monkey-patch applications to use your type. Attached hack_dict.py is an example, but it only handles code explicitly calling the "dict()" type to create a dictionray.

Another option for you is to maintain your downstream CPython patch, sorry.
History
Date User Action Args
2017-02-05 23:58:10hayposetfiles: + hack_dict.py
nosy: + haypo
messages: + msg287067

2017-02-05 23:38:17lambysetmessages: + msg287066
2017-02-05 02:51:53inada.naokisetmessages: + msg287034
2017-02-04 20:46:49lambysetmessages: + msg286988
2017-02-04 11:33:44inada.naokisetmessages: + msg286958
2017-02-04 09:37:50lambysetmessages: + msg286938
2017-02-03 20:50:27r.david.murraysetmessages: + msg286888
2017-02-03 20:16:57lambysetmessages: + msg286886
2017-02-03 17:01:32rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg286879

resolution: rejected
stage: resolved
2017-02-03 14:50:36r.david.murraysetnosy: + r.david.murray
messages: + msg286872
2017-02-03 10:20:15inada.naokisetmessages: + msg286855
2017-02-03 10:02:18lambysetmessages: + msg286853
2017-02-03 09:58:46inada.naokisetmessages: + msg286852
2017-02-03 09:48:39lambysetmessages: + msg286851
2017-02-03 09:40:41inada.naokisetnosy: + inada.naoki
messages: + msg286850
2017-02-03 09:37:45lambycreate