classification
Title: urlencode() of dictionary not as expected
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, drueter@assyst.com, gdr@garethrees.org, martin.panter, r.david.murray
Priority: normal Keywords:

Created on 2015-06-17 06:48 by drueter@assyst.com, last changed 2015-06-20 01:29 by martin.panter.

Messages (5)
msg245431 - (view) Author: David Rueter (drueter@assyst.com) Date: 2015-06-17 06:48
In Python 3.4 I would like to serialize a dictionary into a URL-encoded string.

Given a dictionary like this: 

>>> thisDict = {'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}

I would like to be able to return this string:

	SomeVar1=abc&SomeVar2=def&SomeVar3=ghi

I thought that urllib.parse.urlencode would work for me, but it does not:

>>> print(urllib.parse.urlencode(thisDict))
	SomeVar1=%5Bb%27abc%27%5D&SomeVar2=%5Bb%27def%27%5D&SomeVar3=%5Bb%27ghi%27%5D

In other words, urlencode on the dictionary is performing a URL encode on the string that is returned when the dictionary is cast to a string...and is including the square brackets (escaped) and the byte literal "b" indicator.

{'SomeVar1': [b'abc'], 'SomeVar2': [b'def'], 'SomeVar3': [b'ghi']}

I can obtain the desired string with this:

>>> '&'.join("{!s}={!s}".format(key,urllib.parse.quote_plus(str(val[0],'utf-8'))) for (key,val) in thisDict.items())

Is the behavior of urllib.parse.urlencode() on a dictionary intentional?  When would the current behavior ever be useful?

Would it make sense to change the behavior of urllib.parse.urlencode such that it works as described above?
msg245432 - (view) Author: Gareth Rees (gdr@garethrees.org) * (Python triager) Date: 2015-06-17 08:53
If you read the documentation for urllib.parse.urlencode [1], you'll
see that it says:

    The value element in itself can be a sequence and in that case, if
    the optional parameter doseq is evaluates to True, individual
    key=value pairs separated by '&' are generated for each element of
    the value sequence for the key.

So you need to write:

    >>> urllib.parse.urlencode(thisDict, doseq=True)
    'SomeVar3=ghi&SomeVar1=abc&SomeVar2=def'

[1]: https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode
msg245435 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-06-17 14:16
That behavior is complex enough that I think it would be worth adding an example of it to the examples section (and maybe linking directly from the doseq explanation to that specific example).
msg245437 - (view) Author: David Rueter (drueter@assyst.com) Date: 2015-06-17 14:27
Ah hah! Indeed, urlencode() does work on dictionaries as expected when doseq=True. Thank you for clarifying.

FWIW I had read the documentation and the referenced examples multiple times. I would like to make a few documentation suggestions for clarity.

1 ) Update https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlencode 

Where documentation currently says: "When a sequence of two-element tuples is used as the query argument, the first element of each tuple is a key and the second is a value. The value element in itself can be a sequence and in that case, if the optional parameter doseq is evaluates to True, individual key=value pairs separated by '&' are generated for each element of the value sequence for the key. The order of parameters in the encoded string will match the order of parameter tuples in the sequence."

Perhaps instead the following would be more clear:  "The query argument may be a sequence of two-element tuples where the first element of each tuple is a key and the second is a value.  However the optional parameter doseq must then be set to True in order to reliably generate individual key=value pairs separated by '&' for each element of the value sequence for the key, and to preserve the sequence of the elements in the query parameter."

2) Update https://docs.python.org/3/library/urllib.request.html#urllib-examples

The examples are referenced from the documentation: "Refer to urllib examples to find out how urlencode method can be used for generating query string for a URL or data for POST."  However the example page provides contradictory information and examples for this specific use case.

Currently the examples page says:  "The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns a string in this format. It should be encoded to bytes before being used as the data parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."

Perhaps instead the following would be more clear:  "The urllib.parse.urlencode() query parameter can accept a mapping or sequence of 2-tuples and return a string in this format if the optional parameter doseq is set to True. It should be encoded to bytes before being used as the query parameter. The charset parameter in Content-Type header may be used to specify the encoding. If charset parameter is not sent with the Content-Type header, the server following the HTTP 1.1 recommendation may assume that the data is encoded in ISO-8859-1 encoding. It is advisable to use charset parameter with encoding used in Content-Type header with the Request."

3) Also on the example page, there are examples of urlencode operating on dictionaries where doseq is not provided. This is confusing.  It would be better to show doseq = True:

Here is an example session that uses the GET method to retrieve a URL containing parameters:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
...
The following example uses the POST method instead. Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:
...
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})

I suggest that these examples read:
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}, doseq=true)
msg245537 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-06-20 01:29
1. urlencode(): I agree the documentation is unclear. But David Rueter’s suggestion does not help much. I think doseq=True is meant to also work for a mapping query (as in original post), and is not required in the sequence-of-tuples mode if each tuple has a single parameter value. Perhaps something like this could work instead:

“When a sequence of two-element tuples is used as *query*, the first element of each tuple is a key and the second specifies one or more values. If *doseq* is true, each *query* (mapping or sequence) item can specify a sequence of values; if *doseq* is false (the default), each item specifies a single value. The order of parameters in the encoded string will match the order of items in *query* and the order of values in an item.”

2. urlopen(data=...) and Request(data=...): I don’t see the contradiction. It looks like David Rueter’s suggestion only changes the first sentence, to say doseq=True is required to get the urlencoded format, but this is not required. See also Issue 23360 about my own problems with this bit of the documentation.

3. Examples: Again, I do not see why doseq=True should be shown when it is simpler without. But an example of when it is useful would be good, as R David Murray suggested.
History
Date User Action Args
2015-06-20 01:29:19martin.pantersetnosy: + martin.panter
messages: + msg245537
2015-06-17 14:27:39drueter@assyst.comsetmessages: + msg245437
2015-06-17 14:16:02r.david.murraysetversions: + Python 3.5, Python 3.6
nosy: + docs@python, r.david.murray

messages: + msg245435

assignee: docs@python
components: + Documentation
2015-06-17 08:53:38gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg245432
2015-06-17 06:48:42drueter@assyst.comcreate