classification
Title: Remove “Content-Type: application/x-www-form-urlencoded; charset” advice
Type: Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, martin.panter, orsenthil, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2015-11-07 08:43 by martin.panter, last changed 2015-11-24 23:38 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
urlencoded-charset.patch martin.panter, 2015-11-07 08:43 review
urlencoded-charset.2.patch martin.panter, 2015-11-08 23:25 review
Messages (6)
msg254263 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-07 08:43
I understand using a “charset” parameter with “Content-Type: application/x-www-form-urlencoded” is not standardized. Since Issue 11082, the documentation advises to use it, but I propose to remove this advice.

HTML 5 mentions setting a _charset_ parameter, and mentions decoding with a default of UTF-8 (not Latin-1!), but does not mention any Content-Type parameters.

There seems to be confusion about what encoding it actually represents. According to <https://bugzilla.mozilla.org/show_bug.cgi?id=7533>, Mozilla briefly set this “charset” parameter a long time ago, but it would have corresponded to the urlencode(encoding=...) argument. The Python documentation currently suggests calling data.encode("utf-8"), which is misleading, because the urlencode() output is already guaranteed to be ASCII text. Any non-ASCII characters and bytes will already be character-encoded and percent-encoded by urlencode(). So I also propose to change the examples to data.encode("ascii").
msg254316 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-08 01:16
Although I didn't read through the whole thing, the mozilla bug discussion indicates this is the correct way to specify the charset, it's just that there was lots of buggy software that didn't handle setting it to latin-1.  Is the same true for setting it to utf-8?

Agreed about the encode call.
msg254332 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-08 10:56
I think the server bugs referenced by the Mozilla bug are mainly about servers that do not recognize the content type at all, due the the presence of any charset parameter. They probably do something like “if headers['Content-Type'] == 'application/x-www-form-urlencoded' ” without checking for parameters first. So it wouldn’t matter if it was charset=latin-1 or charset=utf-8.

A couple comments in the Mozilla bug say that including “charset” is specified by a HTTP standard, but I suspect this may be a mistake. Perhaps this is the best evidence for my argument, from <http://www.w3.org/TR/html/forms.html#url-encoded-form-data>:

'''
Parameters on the “application/x-www-form-urlencoded” MIME type are ignored. In particular, this MIME type does not support the “charset” parameter.
'''
msg254347 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-08 17:25
OK, I'll accept that as authoritative :)

One very minor comment in the review, otherwise looks good to me.
msg254361 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-08 23:25
The second version of the patch changes some more examples in the how-to to data.encode("ascii"). I’ll leave this open for a bit in case Senthil is around and wants to comment (seeing as he added the text I am removing).
msg255302 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-11-24 23:07
New changeset 16fec577fd8b by Martin Panter in branch '3.4':
Issue #25576: Remove application/x-www-form-urlencoded charset advice
https://hg.python.org/cpython/rev/16fec577fd8b

New changeset 95ae5262d27c by Martin Panter in branch '3.5':
Issue #25576: Merge www-form-urlencoded doc from 3.4 into 3.5
https://hg.python.org/cpython/rev/95ae5262d27c

New changeset d52521d13a64 by Martin Panter in branch 'default':
Issue #25576: Merge www-form-urlencoded doc from 3.5
https://hg.python.org/cpython/rev/d52521d13a64

New changeset 671429cc1d96 by Martin Panter in branch 'default':
Issue #25576: Apply fix to new urlopen() doc string
https://hg.python.org/cpython/rev/671429cc1d96
History
Date User Action Args
2015-11-24 23:38:26martin.pantersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-11-24 23:07:24python-devsetnosy: + python-dev
messages: + msg255302
2015-11-08 23:25:05martin.pantersetfiles: + urlencoded-charset.2.patch

messages: + msg254361
2015-11-08 17:25:27r.david.murraysetmessages: + msg254347
2015-11-08 10:56:36martin.pantersetmessages: + msg254332
2015-11-08 01:16:54r.david.murraysetnosy: + r.david.murray
messages: + msg254316
2015-11-07 08:43:40martin.pantercreate