Issue12455
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2011-06-30 19:23 by Cal.Leeming, last changed 2022-04-11 14:57 by admin.
Messages (20) | |||
---|---|---|---|
msg139512 - (view) | Author: Cal Leeming (Cal.Leeming) | Date: 2011-06-30 19:23 | |
I came up against a problem today whilst trying to submit a request to a remote API. The header needed to contain: 'Content-MD5' : "md5here" But the urllib2 Request() forces capitalize() on all header names, and transformed it into "Content-Md5", which in turn made the remote web server ignore the header and break the request (as the remote side is case sensitive, of which we don't have any control over). I attempted to get smart by using the following patch: class _str(str): def capitalize(s): print s return s _headers = {_str("Content-MD5") : 'md5here'} But this failed to work: ---HEADERS--- {'Content-MD5': 'nts0yj7AdzJALyNOxafDyA=='} ---URLLIB2 DEBUG--- send: 'POST /api/v1 m HTTP/1.1\r\nContent-Md5: nts0yj7AdzJALyNOxafDyA==\r\n\r\n\r\n' Upon inspecting the urllib2.py source, I found 3 references to capitalize() which seem to cause this problem, but it seems impossible to monkey patch, nor fix without forking. Therefore, I'd like to +1 a feature request to have an extra option at the time of the request being opened, to bypass the capitalize() on header names (maybe, header_keep_original = True or something). And, if anyone could suggest a possible monkey patch (which doesn't involve forking huge chunks of code), that'd be good too :) Thanks Cal |
|||
msg139514 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-06-30 19:33 | |
Well, three occurrences means you only have three methods to patch (and two of them are trivial). But I agree that copying the non-trivial method doesn't look fun from a maintenance perspective. You could also try using an object that is not a subclass of str. The problem with subclassing str is that some (most?) string methods do not do a subclass check but directly call the C implementation of the method. I think there's an issue in the tracker somewhere about that. The problem with not subclassing string, of course, is that you may end up implementing a lot of methods on your object to get it to play nicely with urllib2's assumption that it *is* a string. |
|||
msg139515 - (view) | Author: Cal Leeming (Cal.Leeming) | Date: 2011-06-30 19:39 | |
Sorry, I should clarify.. The str() patch worked, but it failed to work within the realm of urllib2: s = _str("Content-MD5") print "Builtin:" print "plain: %s" % ( s ) print "capitalized: %s" % ( s.capitalize() ) s = str("Content-MD5") print "Builtin:" print "plain: %s" % ( s ) print "capitalized: %s" % ( s.capitalize() ) Builtin: plain: Content-MD5 capitalized: Content-MD5 Builtin: plain: Content-MD5 capitalized: Content-md5 Why it works in the unit test, and not within urllib2, is totally beyond me. Especially since I put a debug call on the method, and it does get called.. yet urllib2 debug still shows it sending the wrong value. --- capitalize() bypassed: sending value: Content-MD5 send: 'POST /api/url\r\nContent-Md5: nts0yj7AdzJALyNOxafDyA==\r\n\r\n' --- I have a feeling that the problem may lie somewhere after the opener (like HTTPConnection or AbstractHTTPHandler), rather than the urllib2 calls to capitalize(), but not having much luck monkey patching those :X |
|||
msg139516 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-06-30 19:50 | |
Well, judging by your test it isn't capitalize that's the issue. capitalize produces Content-md5, whereas debug is showing urllib2 sending Content-Md5. So something else is massaging the header name on send. |
|||
msg139519 - (view) | Author: Cal Leeming (Cal.Leeming) | Date: 2011-06-30 20:17 | |
(short answer, I found the cause, and a suitable monkey patch) - below are details of how I did it and steps I took. ----- Okay so I forked AbstractHTTPHandler() then patched do_request_(), at which point "request.headers" and request.header_items() have the correct header name (Content-MD5). So I tried this: opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1)) opener.addheaders = [("Content-TE5", 'test'), ] However the headers came back capitalized, so the problem is happening somewhere after addheaders. > grep -R "addheaders" *.py urllib.py: self.addheaders = [('User-Agent', self.version)] urllib.py: self.addheaders.append(args) urllib.py: for args in self.addheaders: h.putheader(*args) urllib.py: for args in self.addheaders: h.putheader(*args) urllib2.py: self.addheaders = [('User-agent', client_version)] urllib2.py: for name, value in self.parent.addheaders: > grep -R "def putheader" *.py httplib.py: def putheader(self, header, value): httplib.py: def putheader(self, header, *values): I also then found: http://stackoverflow.com/questions/3278418/testing-urllib2-application-http-responses-loaded-from-files I then patched this: class HTTPConnection(httplib.HTTPConnection): def putheader(self, header, value): print [header, value] This in turn brought back: ['Content-Md5', 'nts0yj7AdzJALyNOxafDyA=='] Which means it's happening before putheader(). So I patched _send_request() on HTTPConnection(), and that also brought back 'Content-Md5'. Exception trace shows: File "/ddcms/dev/webapp/../webapp/sites/ma/management/commands/ddcms.py", line 147, in _send_request _res = opener.open(req) -- CORRECT -- File "/usr/local/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) -- CORRECT -- File "/usr/local/lib/python2.6/urllib2.py", line 409, in _open '_open', req) -- CORRECT -- File "/usr/local/lib/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) -- CORRECT -- File "/ddcms/dev/webapp/../webapp/sites/ma/management/commands/ddcms.py", line 126, in http_open return self.do_open(HTTPConnection, req) -- CORRECT -- File "/usr/local/lib/python2.6/urllib2.py", line 1142, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) -- INVALID -- File "/usr/local/lib/python2.6/httplib.py", line 914, in request self._send_request(method, url, body, headers) File "/ddcms/dev/webapp/../webapp/sites/ma/management/commands/ddcms.py", line 122, in _send_request raise The line that causes it? headers = dict( (name.title(), val) for name, val in headers.items()) So it would appear that title() also needs monkey patching.. Patched to use: # Patch case sensitive headers (due to reflected API being non RFC compliant, and # urllib2 not giving the option to choose between the two) class _str(str): def capitalize(s): print "capitalize() bypassed: sending value: %s" % ( s ) return s def title(s): print "title() bypassed: sending value: %s" % ( s ) return s _headers = {_str('Content-MD5') : _md5_content} capitalize() bypassed: sending value: Content-MD5 title() bypassed: sending value: Content-MD5 send: 'POST /url/api HTTP/1.1\r\nContent-MD5: nts0yj7AdzJALyNOxafDyA==\r\n\r\n' |
|||
msg139520 - (view) | Author: Cal Leeming (Cal.Leeming) | Date: 2011-06-30 20:19 | |
So @r.david.murray, it would appear you were right :D Really, I should have looped through each method on str(), and wrapped them all to see which were being called, but lesson learned I guess. Sooo, I guess now the question is, can we possibly get a vote on having a feature which disables this functionality from the opener level. Something like: opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1, keep_original_header_case=True)) But obviously a less tedious attribute name :) In the mean times, if anyone else comes up against this problem, the code I pasted above will work fine for now. Cal |
|||
msg139523 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-06-30 20:53 | |
A feature request for a way to control this is reasonable. However, new features can only go into 3.3. |
|||
msg139524 - (view) | Author: Cal Leeming (Cal.Leeming) | Date: 2011-06-30 21:00 | |
Damn 3.3 huh? Ah well, at least it's in the pipeline ^_^ Thanks for your help on this @r.david.murray! |
|||
msg139546 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-07-01 06:09 | |
AFAIR, that capitalize part is somewhere a requirement in RFC, if the server did not behave in proper manner, it may not be a good idea for the client to change (or be permissive the flag). |
|||
msg139547 - (view) | Author: Senthil Kumaran (orsenthil) * | Date: 2011-07-01 06:12 | |
Sorry, not "Capitalize", but the "Title" part. One can some bugs which lead to this change in the urllib2. |
|||
msg139551 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2011-07-01 07:45 | |
Quoting http://tools.ietf.org/html/rfc2068#section-4.2: Field names are case-insensitive. Which is only logical, since they are modeled on email headers, and email header names are case insensitive. So, the server in question is broken, yes, but that doesn't mean we can't provide a facility to allow Python to inter-operate with it. Email, for example, preserves the case of the field names it parses or receives from the application program, but otherwise treats them case-insensitively. However, since the current code coerces to title case, we have to provide this feature as a switchable facility defaulting to the current behavior, for backward compatibility reasons. And someone needs to write a patch.... |
|||
msg139578 - (view) | Author: Cal Leeming (sleepycal) | Date: 2011-07-01 13:21 | |
Thats full understandable that the default won't change. I'll put this in my todo list to write a patch in a week or two. On 1 Jul 2011 08:45, "R. David Murray" <report@bugs.python.org> wrote: > > R. David Murray <rdmurray@bitdance.com> added the comment: > > Quoting http://tools.ietf.org/html/rfc2068#section-4.2: > > Field names are case-insensitive. > > Which is only logical, since they are modeled on email headers, and email header names are case insensitive. So, the server in question is broken, yes, but that doesn't mean we can't provide a facility to allow Python to inter-operate with it. Email, for example, preserves the case of the field names it parses or receives from the application program, but otherwise treats them case-insensitively. However, since the current code coerces to title case, we have to provide this feature as a switchable facility defaulting to the current behavior, for backward compatibility reasons. > > And someone needs to write a patch.... > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue12455> > _______________________________________ |
|||
msg175484 - (view) | Author: Terry J. Reedy (terry.reedy) * | Date: 2012-11-13 01:40 | |
The comment about urllib.request forcing .title() is consistent with 'Content-Length' and 'Content-Type' in the docs but puzzling and inconsistent given that in 3.3, header names are printed .capitalize()'ed and not .title()'ed and that has_header and get_header *require* the .capitalize() form and reject the .title() form. import urllib.request opener = urllib.request.build_opener() request = urllib.request.Request("http://example.com/", headers = {"Content-Type": "application/x-www-form-urlencoded"}) opener.open(request, "1".encode("us-ascii")) print(request.header_items(), request.has_header("Content-Type"), request.has_header("Content-type"), request.get_header("Content-Type"), request.get_header("Content-type"), sep='\n') >>> [('Content-type', 'application/x-www-form-urlencoded'), ('Content-length', '1'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'example.com')] False True None application/x-www-form-urlencoded Did .title in 2.7 urllib2 request get changed to .capitalize in 3.x urllib.request (without the examples in the doc being changed) or is request inconsistent within itself? Cal did not the 2.7 code exhibiting the problme, but when I add this code in 3.3, the output start as shown. request.add_header('Content-MD5', 'xxx') print(request.header_items()) # [('Content-md5', 'xxx'), ... So is 3.3 sending 'Content-Md5' or 'Content-md5' My guess is the former, as urllib.request has the same single use of .title in .do_open as Cal quoted. The two files also have the same three uses of .capitalize in .add_header, .add_unredirected_header, and .do_request. So it seems that header names are normalized to .capitalize on entry and .title on sending, or something like that. Ugh. Is there any good justification for this? I do not see anything in the doc about headers names being normalized either way or about the requirements of has_/get_header. If the behavior were consistent and the same since forever, then I would say the current docs should be improved and a change would be an enhancement request. Since the behavior seems inconsistent, I am more inclined to think there is a bug. I realize that this message expands the scope of the issue, but it is all about the handing of header names in requests. |
|||
msg183233 - (view) | Author: karl (karlcow) * | Date: 2013-02-28 21:13 | |
Note that HTTP header fields are case-insensitive. See http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging#section-3.2 Each HTTP header field consists of a case-insensitive field name followed by a colon (":"), optional whitespace, and the field value. Basically the author of a request can set them to whatever he/she wants. But we should, IMHO, respect the author intent. It might happen that someone will choose a specific combination of casing to deal with broken servers and/or proxies. So a cycle of set/get/send should not modify at all the headers. |
|||
msg183237 - (view) | Author: karl (karlcow) * | Date: 2013-02-28 21:47 | |
So looking at the casing of headers, I discovered other issues. I opened another bug. http://bugs.python.org/issue17322 |
|||
msg183362 - (view) | Author: karl (karlcow) * | Date: 2013-03-03 03:59 | |
Are there issues related to removing the capitalize() and title() appears? # title() * http://hg.python.org/cpython/file/886df716cd09/Lib/urllib/request.py#l1239 # capitalize() * http://hg.python.org/cpython/file/886df716cd09/Lib/urllib/request.py#l359 * http://hg.python.org/cpython/file/886df716cd09/Lib/urllib/request.py#l363 * http://hg.python.org/cpython/file/886df716cd09/Lib/urllib/request.py#l1206 Because the behavior is inconsistent, I would live to propose a patch removing them and be sure to be completely neutral with regards to them. |
|||
msg183364 - (view) | Author: karl (karlcow) * | Date: 2013-03-03 04:33 | |
tests in http://hg.python.org/cpython/file/886df716cd09/Lib/test/test_wsgiref.py#l370 also checking that everything is case insensitive. And the method to get the headers in wsgiref, make sure they are lower-case http://hg.python.org/cpython/file/886df716cd09/Lib/wsgiref/headers.py#l82 |
|||
msg184807 - (view) | Author: karl (karlcow) * | Date: 2013-03-20 21:49 | |
terry.reedy: You said: "and that has_header and get_header *require* the .capitalize() form and reject the .title() form." I made a patch for these two. See http://bugs.python.org/issue5550 |
|||
msg220886 - (view) | Author: Mark Lawrence (BreamoreBoy) * | Date: 2014-06-17 20:46 | |
@Karl do you intend following up on this issue? |
|||
msg221400 - (view) | Author: karl (karlcow) * | Date: 2014-06-24 06:39 | |
Mark, I'm happy to followup. I will be in favor of removing any capitalization and not to change headers whatever they are. Because it doesn't matter per spec. Browsers do not care about the capitalization. And I haven't identified Web Compatibility issues regarding the capitalization. That said, it seems that Cal msg139512 had an issue, I would love to know which server/API had this behavior to fill a but at http://webcompat.com/ So… Where do we stand? Feature or removing anything which modifies the capitalization of headers? |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:19 | admin | set | github: 56664 |
2019-11-17 22:33:13 | martin.panter | link | issue38831 superseder |
2019-03-15 23:06:09 | BreamoreBoy | set | nosy:
- BreamoreBoy |
2015-02-13 01:22:50 | demian.brecht | set | nosy:
- demian.brecht |
2014-09-23 15:22:40 | r.david.murray | link | issue22467 superseder |
2014-06-24 06:39:01 | karlcow | set | messages: + msg221400 |
2014-06-17 20:46:44 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages: + msg220886 |
2013-03-20 21:49:30 | karlcow | set | messages: + msg184807 |
2013-03-03 04:33:23 | karlcow | set | messages: + msg183364 |
2013-03-03 03:59:14 | karlcow | set | messages: + msg183362 |
2013-02-28 21:47:58 | karlcow | set | messages: + msg183237 |
2013-02-28 21:13:47 | karlcow | set | nosy:
+ karlcow messages: + msg183233 |
2013-02-24 02:09:40 | demian.brecht | set | nosy:
+ demian.brecht |
2012-11-13 01:40:29 | terry.reedy | set | nosy:
+ terry.reedy messages: + msg175484 versions: + Python 3.4, - Python 3.3 |
2011-07-19 14:59:19 | eric.araujo | set | nosy:
+ eric.araujo |
2011-07-19 14:59:13 | eric.araujo | set | files: - unnamed |
2011-07-01 13:21:37 | sleepycal | set | files:
+ unnamed messages: + msg139578 nosy: + sleepycal |
2011-07-01 07:45:43 | r.david.murray | set | messages: + msg139551 |
2011-07-01 06:12:47 | orsenthil | set | messages: + msg139547 |
2011-07-01 06:09:18 | orsenthil | set | nosy:
+ orsenthil messages: + msg139546 |
2011-06-30 22:54:29 | santoso.wijaya | set | nosy:
+ santoso.wijaya |
2011-06-30 21:00:40 | Cal.Leeming | set | messages: + msg139524 |
2011-06-30 20:53:45 | r.david.murray | set | versions:
+ Python 3.3, - Python 2.7 title: urllib2 Request() forces capitalize() on header names, breaking some requests -> urllib2 forces title() on header names, breaking some requests messages: + msg139523 type: behavior -> enhancement stage: needs patch |
2011-06-30 20:19:38 | Cal.Leeming | set | messages: + msg139520 |
2011-06-30 20:17:20 | Cal.Leeming | set | messages: + msg139519 |
2011-06-30 19:50:52 | r.david.murray | set | messages: + msg139516 |
2011-06-30 19:39:52 | Cal.Leeming | set | messages: + msg139515 |
2011-06-30 19:33:47 | r.david.murray | set | nosy:
+ r.david.murray messages: + msg139514 |
2011-06-30 19:23:38 | Cal.Leeming | create |