classification
Title: urllib.request > Request.add_header("abcd","efgh") fails with character ":" in first parameter string
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: berker.peksag, crickert, docs@python, martin.panter, python-dev, r.david.murray
Priority: normal Keywords: easy, patch

Created on 2015-11-06 20:39 by crickert, last changed 2016-06-01 09:26 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
add_header.patch martin.panter, 2015-11-09 01:04
Messages (15)
msg254214 - (view) Author: Christian Rickert (crickert) Date: 2015-11-06 20:39
A piece of software stopped working after the update from Python 3.4.x to 3.5 (both Linux and Windows).

I found the cause in the  Request.add_header("abcd","efgh")  function:
If the first parameter string "abcd" contains a colon (":") character, a ValueError is raised and the execution stops.

This is an example code:

>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('utf-8')
>>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
>>> # adding charset parameter to the Content-Type header.
>>> request.add_header("Content-Type:","application/x-www-form-urlencoded;charset=utf-8")
>>> with urllib.request.urlopen(request, data) as f:
...     print(f.read().decode('utf-8'))
...

This is the error:

Traceback (most recent call last):
  File "C:\Users\user\Desktop\example.py", line 9, in <module>
    with urllib.request.urlopen(request, data) as f:
  File "C:\python\lib\urllib\request.py", line 162, in urlopen
    return opener.open(url, data, timeout)
  File "C:\python\lib\urllib\request.py", line 465, in open
    response = self._open(req, data)
  File "C:\python\lib\urllib\request.py", line 483, in _open
    '_open', req)
  File "C:\python\lib\urllib\request.py", line 443, in _call_chain
    result = func(*args)
  File "C:\python\lib\urllib\request.py", line 1268, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\python\lib\urllib\request.py", line 1240, in do_open
    h.request(req.get_method(), req.selector, req.data, headers)
  File "C:\python\lib\http\client.py", line 1083, in request
    self._send_request(method, url, body, headers)
  File "C:\python\lib\http\client.py", line 1123, in _send_request
    self.putheader(hdr, value)
  File "C:\python\lib\http\client.py", line 1050, in putheader
    raise ValueError('Invalid header name %r' % (header,))
ValueError: Invalid header name b'Content-Type:'

Steps to reproduce:

Add a colon character anywhere into the first parameter of the  Request.add_header()  function and execute the code using Python 3.5.x.
msg254216 - (view) Author: Christian Rickert (crickert) Date: 2015-11-06 20:49
The URL "http://requestb.in/xrbl82xr" is invalid, you'll get a "urllib.error.HTTPError: HTTP Error 404: NOT FOUND" error."

Instead, please use "http://www.example.com" to confirm.
msg254217 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-06 21:08
This behavior change was part of a security fix, and will appear in the next version of 3.4 as well.  See issue 22928.  Header names may not contain colons, the colon separator is added when the header is rendered.  Detecting and rejecting them guards against header injection attacks.

However, that fix was done in httplib.  I think it would also be worthwhile to fix add_header so that it rejects invalid header components when called, instead of only having the check done later in httplib, at a point distant from where the problem occurred.
msg254221 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-06 21:26
By the way, that your application worked before was pure luck, here is what httplib sends in 3.4 given your test program:

POST / HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Connection: close
Host: xxxx.xxxx.com:9999
Content-Length: 21
Content-Type:: application/x-www-form-urlencoded;charset=utf-8
Content-Type: application/x-www-form-urlencoded
msg254227 - (view) Author: Christian Rickert (crickert) Date: 2015-11-06 21:43
Hi David,

Thanks for looking into this.

IMHO it would be very helpful for code debugging, if "add_header" rejects invalid header arguments ad hoc.
As an alternative, the documentation I used for the implementation [https://docs.python.org/3.6/library/urllib.request.html] could be updated.


"Content-Type:: application/x-www-form-urlencoded;charset=utf-8"
A scope resolution operator thingy in the header? - What could possibly go wrong?  ^^
msg254231 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-06 21:50
I don't think the docs need updating if the ValueError is raised by add_header.  The 'key' in the docs is the header name, as examples in other sections of that doc make clear.  Granted you won't necessarily see that if you just read the add_header entry, but we can't explain every detail in every entry.

Hmm.  Perhaps an example could be added, though.
msg254235 - (view) Author: Christian Rickert (crickert) Date: 2015-11-06 22:14
I would suggest a red warning box that informs the reader about her/his responsibility to comply with HTTP specifications - i.e. ensuring compatibility and security.
msg254238 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-06 23:15
This change is a result of parameter checking added in Issue 22928. The colon is not meant to be part of a field name. This is what the request being sent by your example looks like in the unpatched 3.4. (I changed to localhost.) Your attempt to override the default Content-Type is not working due to the colon:

POST /xrbl82xr HTTP/1.1
Accept-Encoding: identity
Content-Type: application/x-www-form-urlencoded
Host: localhost
User-Agent: Python-urllib/3.4
Content-Length: 21
Content-Type:: application/x-www-form-urlencoded;charset=utf-8
Connection: close

spam=1&eggs=2&bacon=0

So I don’t think this is a valid bug or regression. What gave you the idea to include the colon? Does the documentation need clarifying?

Also, I would recommend not trying to set a “charset” attribute with the form-urlencoded content type in general. It is not standardized, and I proposed to remove the recommendation from the documentation in Issue 23360 (feedback welcome). If you really need to send non-ASCII data, you might be able to get away with UTF-8 by default. Also, HTML 5 mentions a _charset_ field, and there is the multipart/form-data content type.
msg254239 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-06 23:27
Sorry, it seems I missed some updates to this bug.

I think the docs have enough big red warning boxes. But an explanation or short example might be good.
msg254241 - (view) Author: Christian Rickert (crickert) Date: 2015-11-06 23:45
>>So I don’t think this is a valid bug or regression. What gave you the idea to include the colon?

"We ask that if you are going to consume the API that you pass a custom User-Agent header along with your requests. The User-Agent should primarily include information on how we can contact you, but it is also a good idea to include your application name and version."
[https://eveonline-third-party-documentation.readthedocs.org/en/latest/xmlapi/intro/#other-information]

This is why I created custom User-Agent headers:
.add_header("Contact: ","me")
.add_header("Software: ","sw")
.add_header("Version: ","0.1")
I didn't really need the colons, it just worked.


>>Does the documentation need clarifying?

In the description for "headers" in the "class urllib.request.Request" section, Mozilla's string is given as an example:
"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
Including slashes, semicolons, and braces, I didn't think I would run into problems with using a colon (in the first parameter string).

>>Also, I would recommend not trying to set a “charset” attribute with the form-urlencoded content type in general. It is not standardized, and I proposed to remove the recommendation from the documentation in Issue 23360 (feedback welcome).

I agree. The examples provided should be good (standard-conform) examples.


>>I think the docs have enough big red warning boxes. But an explanation or short example might be good.
Maybe a comment line in the examples?
msg254243 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-07 00:24
Ah, I think I see where you are coming from. I guess you aren’t intimately familiar with the HTTP protocol. There are various types of “headers” aka header fields, not only User-Agent, but Content-Type and others. When they suggest passing a User-Agent header, that actually means a call like

request.add_header("User-Agent", "custom text here")

using the exact string "User-Agent" as the field name (key).

If you can given a specific comment line to add to a specific example, that might be useful. Even if you aren’t 100% sure of the details :)
msg254338 - (view) Author: Christian Rickert (crickert) Date: 2015-11-08 12:40
>>Ah, I think I see where you are coming from. I guess you aren’t intimately familiar with the HTTP protocol.

Exactly. :)


>>If you can given a specific comment line to add to a specific example, that might be useful. Even if you aren’t 100% sure of the details :)

[OpenerDirector example]
import urllib.request
opener = urllib.request.build_opener()
# adding custom header value to the 'User-agent' key
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

"This is often used to “spoof” the User-Agent header VALUE, which is used by a browser to identify itself ..."
msg254362 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-09 01:04
Hmm, I wonder if that OpenerDirector example is a bit obscure. Normally you would use the default urlopen() to set User-Agent, without resorting to a custom OpenerDirector.

In my patch:

* Included “User-Agent header _value_” in the “headers” parameter description
* Linked to <https://docs.python.org/dev/howto/urllib2.html> from examples section
* Added add_header("User-Agent", ...) example with made-up custom value

Let me know what you think (if anything is unnecessary, other suggestions?)
msg266793 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-06-01 00:20
add_header.patch looks good to me. Thanks Martin.
msg266808 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-06-01 08:29
New changeset 320b9a65ac07 by Martin Panter in branch '3.5':
Issue #25570: Add example of customizing User-Agent via add_header()
https://hg.python.org/cpython/rev/320b9a65ac07

New changeset 75dc64c8c22b by Martin Panter in branch 'default':
Issue #25570: Merge add_header() example from 3.5
https://hg.python.org/cpython/rev/75dc64c8c22b

New changeset 3aa49900869b by Martin Panter in branch '2.7':
Issue #25570: Add example of customizing User-Agent via add_header()
https://hg.python.org/cpython/rev/3aa49900869b
History
Date User Action Args
2016-06-01 09:26:13martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-06-01 08:29:46python-devsetnosy: + python-dev
messages: + msg266808
2016-06-01 00:20:45berker.peksagsetversions: - Python 3.4
nosy: + berker.peksag

messages: + msg266793

stage: patch review -> commit review
2015-11-09 01:04:16martin.pantersetfiles: + add_header.patch


assignee: docs@python
keywords: + patch
stage: needs patch -> patch review
versions: + Python 2.7, Python 3.4, Python 3.6
nosy: + docs@python
messages: + msg254362
components: + Documentation, - Library (Lib)
type: behavior -> enhancement
2015-11-08 12:40:21crickertsetmessages: + msg254338
2015-11-07 00:24:38martin.pantersetmessages: + msg254243
2015-11-06 23:45:29crickertsetmessages: + msg254241
2015-11-06 23:27:23martin.pantersetmessages: + msg254239
stage: needs patch
2015-11-06 23:15:25martin.pantersetnosy: + martin.panter
messages: + msg254238

type: crash -> behavior
stage: needs patch -> (no value)
2015-11-06 22:14:28crickertsetmessages: + msg254235
2015-11-06 21:50:46r.david.murraysetkeywords: + easy

messages: + msg254231
stage: needs patch
2015-11-06 21:43:01crickertsetmessages: + msg254227
2015-11-06 21:26:06r.david.murraysetmessages: + msg254221
2015-11-06 21:08:46r.david.murraysetnosy: + r.david.murray
messages: + msg254217
2015-11-06 20:49:06crickertsetmessages: + msg254216
2015-11-06 20:39:12crickertcreate