This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Sort dict items in urlencode()
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: chris.jerdonek, eric.araujo, gregory.p.smith, gvanrossum, jcea, pitrou
Priority: normal Keywords: easy

Created on 2012-08-17 20:44 by gvanrossum, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (6)
msg168477 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-08-17 20:44
From http://mail.python.org/pipermail/python-dev/2012-August/121364.html :

"""
I just fixed a unittest for some code used at Google that was
comparing a url generated by urllib.encode() to a fixed string. The
problem was caused by turning on PYTHONHASHSEED=1. Because of this,
the code under test would generate a textually different URL each time
the test was run, but the intention of the test was just to check that
all the query parameters were present and equal to the expected
values.

The solution was somewhat painful, I had to parse the url, split the
query parameters, and compare them to a known dict.

I wonder if it wouldn't make sense to change urlencode() to generate
URLs that don't depend on the hash order, for all versions of Python
that support PYTHONHASHSEED? It seems a one-line fix:

        query = query.items()

with this:

        query = sorted(query.items())

This would not prevent breakage of unit tests, but it would make a
much simpler fix possible: simply sort the parameters in the URL.
"""

MvL's response (http://mail.python.org/pipermail/python-dev/2012-August/121366.html) seems to suggest that this can go in as a bugfix in 3.2 and later if augmented with a check for type(query) is dict and a check for whether dict hashing is enabled.

Does anyone want to turn this idea into a patch?
msg168521 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-08-18 17:35
Note, the sort may fail if e.g. bytes and str are mixed in the keys.  So this should be caught and ignored.

An alternative would just be to fix the call site to pass in sorted(query.items()) instead of sorted(query).

Still, I think that more predictable output would be better.
msg168525 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-18 19:57
Honestly, this looks much more like an enhancement than a bugfix to me, so I think it should only target 3.4.
As for the idea proper, I sounds ok to me, as long as only exact dict instances are affected (i.e. OrderedDict and friends keep using their native order).
msg168544 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2012-08-19 03:34
-1 on doing this from me.

While I don't see it hurting anything when "type(query) is dict" I'd much rather encourage people to write better tests that do not take the lazy way out.  Tests that get by comparing a generated string to a "golden" string are soo often wrong because the golden string is not what they actually wanted to test. Instead the author meant to test the meaning of the result rather than the specific incantation of it that they pasted in their code for an assertEqual check.

(This does suggest that some additional unittest methods to make comparing common things like this in a safe manner would be useful to people. urls, html and xml with attributes, json, etc)

Ex: We've run into many tests that were failing due to hash randomization because they compared json strings... rather than suggest that json.dumps sort dictionaries before generating output, having the tests use json.loads and compare the actual data structure rather than a golden value string is much better.

If a change like this is made, it needs to be conditional in 2.7 and 3.2 to only sort when hash randomization is enabled to avoid altering the existing behavior of urlencode for the default PYTHONHASHSEED=0 case on 2.7 and 3.2.

As far as this being an enhancement rather than a bugfix... I sort of agree that this is an enhancement.  And if this is an enhancement, we don't need to do it at all: People will already have done the right thing making their code work with 3.3's hash randomization by default before long before 3.4 is out.
msg168545 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2012-08-19 04:14
I've come to see more downsides than upsides to this idea.  I'm withdrawing it.  Good points, Greg!  (And others.)
msg170302 - (view) Author: Chris Jerdonek (chris.jerdonek) * (Python committer) Date: 2012-09-11 13:14
As an aside, I noticed a doctest affected by this in the urllib HOWTO:

>>> url_values = urllib.parse.urlencode(data)
>>> print(url_values)
name=Somebody+Here&language=Python&location=Northampton

http://docs.python.org/dev/howto/urllib2.html#data

(search for the string "Somebody").  This is merely a curiosity though (i.e. don't construe this as an argument on the issue :)).
History
Date User Action Args
2022-04-11 14:57:34adminsetgithub: 59924
2012-09-11 13:14:20chris.jerdoneksetnosy: + chris.jerdonek
messages: + msg170302
2012-09-10 01:06:31jceasetnosy: + jcea
2012-08-19 04:14:50gvanrossumsetstatus: open -> closed
resolution: rejected
messages: + msg168545
2012-08-19 03:34:42gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg168544
2012-08-18 19:57:27pitrousetversions: - Python 3.2, Python 3.3
nosy: + pitrou

messages: + msg168525

components: + Library (Lib)
type: enhancement
2012-08-18 17:35:19gvanrossumsetmessages: + msg168521
2012-08-17 21:07:25eric.araujosetnosy: + eric.araujo
2012-08-17 20:44:52gvanrossumcreate