msg254214 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-06 20:39 |
A piece of software stopped working after the update from Python 3.4.x to 3.5 (both Linux and Windows).
I found the cause in the Request.add_header("abcd","efgh") function:
If the first parameter string "abcd" contains a colon (":") character, a ValueError is raised and the execution stops.
This is an example code:
>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('utf-8')
>>> request = urllib.request.Request("http://requestb.in/xrbl82xr")
>>> # adding charset parameter to the Content-Type header.
>>> request.add_header("Content-Type:","application/x-www-form-urlencoded;charset=utf-8")
>>> with urllib.request.urlopen(request, data) as f:
... print(f.read().decode('utf-8'))
...
This is the error:
Traceback (most recent call last):
File "C:\Users\user\Desktop\example.py", line 9, in <module>
with urllib.request.urlopen(request, data) as f:
File "C:\python\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\python\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\python\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\python\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\python\lib\urllib\request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\python\lib\urllib\request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\python\lib\http\client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "C:\python\lib\http\client.py", line 1123, in _send_request
self.putheader(hdr, value)
File "C:\python\lib\http\client.py", line 1050, in putheader
raise ValueError('Invalid header name %r' % (header,))
ValueError: Invalid header name b'Content-Type:'
Steps to reproduce:
Add a colon character anywhere into the first parameter of the Request.add_header() function and execute the code using Python 3.5.x.
|
msg254216 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-06 20:49 |
The URL "http://requestb.in/xrbl82xr" is invalid, you'll get a "urllib.error.HTTPError: HTTP Error 404: NOT FOUND" error."
Instead, please use "http://www.example.com" to confirm.
|
msg254217 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-11-06 21:08 |
This behavior change was part of a security fix, and will appear in the next version of 3.4 as well. See issue 22928. Header names may not contain colons, the colon separator is added when the header is rendered. Detecting and rejecting them guards against header injection attacks.
However, that fix was done in httplib. I think it would also be worthwhile to fix add_header so that it rejects invalid header components when called, instead of only having the check done later in httplib, at a point distant from where the problem occurred.
|
msg254221 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-11-06 21:26 |
By the way, that your application worked before was pure luck, here is what httplib sends in 3.4 given your test program:
POST / HTTP/1.1
Accept-Encoding: identity
User-Agent: Python-urllib/3.4
Connection: close
Host: xxxx.xxxx.com:9999
Content-Length: 21
Content-Type:: application/x-www-form-urlencoded;charset=utf-8
Content-Type: application/x-www-form-urlencoded
|
msg254227 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-06 21:43 |
Hi David,
Thanks for looking into this.
IMHO it would be very helpful for code debugging, if "add_header" rejects invalid header arguments ad hoc.
As an alternative, the documentation I used for the implementation [https://docs.python.org/3.6/library/urllib.request.html] could be updated.
"Content-Type:: application/x-www-form-urlencoded;charset=utf-8"
A scope resolution operator thingy in the header? - What could possibly go wrong? ^^
|
msg254231 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-11-06 21:50 |
I don't think the docs need updating if the ValueError is raised by add_header. The 'key' in the docs is the header name, as examples in other sections of that doc make clear. Granted you won't necessarily see that if you just read the add_header entry, but we can't explain every detail in every entry.
Hmm. Perhaps an example could be added, though.
|
msg254235 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-06 22:14 |
I would suggest a red warning box that informs the reader about her/his responsibility to comply with HTTP specifications - i.e. ensuring compatibility and security.
|
msg254238 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-11-06 23:15 |
This change is a result of parameter checking added in Issue 22928. The colon is not meant to be part of a field name. This is what the request being sent by your example looks like in the unpatched 3.4. (I changed to localhost.) Your attempt to override the default Content-Type is not working due to the colon:
POST /xrbl82xr HTTP/1.1
Accept-Encoding: identity
Content-Type: application/x-www-form-urlencoded
Host: localhost
User-Agent: Python-urllib/3.4
Content-Length: 21
Content-Type:: application/x-www-form-urlencoded;charset=utf-8
Connection: close
spam=1&eggs=2&bacon=0
So I don’t think this is a valid bug or regression. What gave you the idea to include the colon? Does the documentation need clarifying?
Also, I would recommend not trying to set a “charset” attribute with the form-urlencoded content type in general. It is not standardized, and I proposed to remove the recommendation from the documentation in Issue 23360 (feedback welcome). If you really need to send non-ASCII data, you might be able to get away with UTF-8 by default. Also, HTML 5 mentions a _charset_ field, and there is the multipart/form-data content type.
|
msg254239 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-11-06 23:27 |
Sorry, it seems I missed some updates to this bug.
I think the docs have enough big red warning boxes. But an explanation or short example might be good.
|
msg254241 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-06 23:45 |
>>So I don’t think this is a valid bug or regression. What gave you the idea to include the colon?
"We ask that if you are going to consume the API that you pass a custom User-Agent header along with your requests. The User-Agent should primarily include information on how we can contact you, but it is also a good idea to include your application name and version."
[https://eveonline-third-party-documentation.readthedocs.org/en/latest/xmlapi/intro/#other-information]
This is why I created custom User-Agent headers:
.add_header("Contact: ","me")
.add_header("Software: ","sw")
.add_header("Version: ","0.1")
I didn't really need the colons, it just worked.
>>Does the documentation need clarifying?
In the description for "headers" in the "class urllib.request.Request" section, Mozilla's string is given as an example:
"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
Including slashes, semicolons, and braces, I didn't think I would run into problems with using a colon (in the first parameter string).
>>Also, I would recommend not trying to set a “charset” attribute with the form-urlencoded content type in general. It is not standardized, and I proposed to remove the recommendation from the documentation in Issue 23360 (feedback welcome).
I agree. The examples provided should be good (standard-conform) examples.
>>I think the docs have enough big red warning boxes. But an explanation or short example might be good.
Maybe a comment line in the examples?
|
msg254243 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-11-07 00:24 |
Ah, I think I see where you are coming from. I guess you aren’t intimately familiar with the HTTP protocol. There are various types of “headers” aka header fields, not only User-Agent, but Content-Type and others. When they suggest passing a User-Agent header, that actually means a call like
request.add_header("User-Agent", "custom text here")
using the exact string "User-Agent" as the field name (key).
If you can given a specific comment line to add to a specific example, that might be useful. Even if you aren’t 100% sure of the details :)
|
msg254338 - (view) |
Author: Christian Rickert (crickert) |
Date: 2015-11-08 12:40 |
>>Ah, I think I see where you are coming from. I guess you aren’t intimately familiar with the HTTP protocol.
Exactly. :)
>>If you can given a specific comment line to add to a specific example, that might be useful. Even if you aren’t 100% sure of the details :)
[OpenerDirector example]
import urllib.request
opener = urllib.request.build_opener()
# adding custom header value to the 'User-agent' key
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')
"This is often used to “spoof” the User-Agent header VALUE, which is used by a browser to identify itself ..."
|
msg254362 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-11-09 01:04 |
Hmm, I wonder if that OpenerDirector example is a bit obscure. Normally you would use the default urlopen() to set User-Agent, without resorting to a custom OpenerDirector.
In my patch:
* Included “User-Agent header _value_” in the “headers” parameter description
* Linked to <https://docs.python.org/dev/howto/urllib2.html> from examples section
* Added add_header("User-Agent", ...) example with made-up custom value
Let me know what you think (if anything is unnecessary, other suggestions?)
|
msg266793 - (view) |
Author: Berker Peksag (berker.peksag) * |
Date: 2016-06-01 00:20 |
add_header.patch looks good to me. Thanks Martin.
|
msg266808 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2016-06-01 08:29 |
New changeset 320b9a65ac07 by Martin Panter in branch '3.5':
Issue #25570: Add example of customizing User-Agent via add_header()
https://hg.python.org/cpython/rev/320b9a65ac07
New changeset 75dc64c8c22b by Martin Panter in branch 'default':
Issue #25570: Merge add_header() example from 3.5
https://hg.python.org/cpython/rev/75dc64c8c22b
New changeset 3aa49900869b by Martin Panter in branch '2.7':
Issue #25570: Add example of customizing User-Agent via add_header()
https://hg.python.org/cpython/rev/3aa49900869b
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:23 | admin | set | github: 69756 |
2016-06-01 09:26:13 | martin.panter | set | status: open -> closed resolution: fixed stage: commit review -> resolved |
2016-06-01 08:29:46 | python-dev | set | nosy:
+ python-dev messages:
+ msg266808
|
2016-06-01 00:20:45 | berker.peksag | set | versions:
- Python 3.4 nosy:
+ berker.peksag
messages:
+ msg266793
stage: patch review -> commit review |
2015-11-09 01:04:16 | martin.panter | set | files:
+ add_header.patch
assignee: docs@python keywords:
+ patch stage: needs patch -> patch review versions:
+ Python 2.7, Python 3.4, Python 3.6 nosy:
+ docs@python messages:
+ msg254362 components:
+ Documentation, - Library (Lib) type: behavior -> enhancement |
2015-11-08 12:40:21 | crickert | set | messages:
+ msg254338 |
2015-11-07 00:24:38 | martin.panter | set | messages:
+ msg254243 |
2015-11-06 23:45:29 | crickert | set | messages:
+ msg254241 |
2015-11-06 23:27:23 | martin.panter | set | messages:
+ msg254239 stage: needs patch |
2015-11-06 23:15:25 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg254238
type: crash -> behavior stage: needs patch -> (no value) |
2015-11-06 22:14:28 | crickert | set | messages:
+ msg254235 |
2015-11-06 21:50:46 | r.david.murray | set | keywords:
+ easy
messages:
+ msg254231 stage: needs patch |
2015-11-06 21:43:01 | crickert | set | messages:
+ msg254227 |
2015-11-06 21:26:06 | r.david.murray | set | messages:
+ msg254221 |
2015-11-06 21:08:46 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg254217
|
2015-11-06 20:49:06 | crickert | set | messages:
+ msg254216 |
2015-11-06 20:39:12 | crickert | create | |