Issue2275
Created on 2008-03-11 21:22 by frispete, last changed 2008-05-06 12:21 by BitTorment.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | Remove |
| urllib2-cap-headers.diff | frispete, 2008-03-11 21:22 | Patch proposal | ||
| issue2275.patch | orsenthil, 2008-03-30 20:09 | |||
| Messages | |||
|---|---|---|---|
| msg63466 (view) | Author: Hans-Peter Jansen (frispete) | Date: 2008-03-11 21:22 | |
The urllib2 behavior related to headers is - hmm - improvable.
It simply capitalize() the key, which leads to funny results like:
Accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
while this is seemingly conforming to the specs, it's simply different
to every other implementation of such things..
And we can do better. How about:
--- /usr/lib/python/urllib2.py 2008-01-10 19:03:55.000000000 +0100
+++ urllib2.py 2008-03-11 21:25:33.523890670 +0100
@@ -261,13 +261,16 @@ class Request:
def is_unverifiable(self):
return self.unverifiable
+ def _cap_header_key(self, key):
+ return '-'.join((ck.capitalize() for ck in key.split('-')))
+
def add_header(self, key, val):
# useful for something like authentication
- self.headers[key.capitalize()] = val
+ self.headers[self._cap_header_key(key)] = val
def add_unredirected_header(self, key, val):
# will not be added to a redirected request
- self.unredirected_hdrs[key.capitalize()] = val
+ self.unredirected_hdrs[self._cap_header_key(key)] = val
def has_header(self, header_name):
return (header_name in self.headers or
I'm not happy with the _cap_header_key name, but you get the idea.
The patch is optimized to operate with maximum locality. It's also
attached.
I would be very grateful, if something similar could be applied.
Opinions?
|
|||
| msg64740 (view) | Author: Senthil (orsenthil) | Date: 2008-03-30 04:43 | |
I understand your implementation of _cap_header_key function. But should not this patch be handled in a way wherein. key.capitalize() is just replaced by key.upper()? Should not that suffice? What is the difference between _cap_header_key and key.upper()? Thank you, Senthil |
|||
| msg64747 (view) | Author: Hans-Peter Jansen (frispete) | Date: 2008-03-30 12:12 | |
> But should not this patch be handled in a way wherein.
> key.capitalize() is just replaced by key.upper()?
Hmm, are you sure?
>>> "hello".upper()
'HELLO'
>>>
but the issue is with values containing dashes:
>>> 'accept-charset'.capitalize()
'Accept-charset'
whereas the rest of the world would expect:
'Accept-Charset'
^
This is especially useful, if you try to emulate the behavior of a
typical browser as close as possible.
|
|||
| msg64748 (view) | Author: John J Lee (jjlee) | Date: 2008-03-30 14:56 | |
urllib2.Request.headers is, in practice, an undocumented public
interface. Did you run the tests? There is room for improvement here,
but not in the way you suggest.
python[1]$ python2.6
iPython 2.6a1+ (trunk:62045M, Mar 30 2008, 03:07:23)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import test.test_urllib2
>>> print test.test_urllib2.test_request_headers_dict.__doc__
The Request.headers dictionary is not a documented interface. It should
stay that way, because the complete set of headers are only accessible
through the .get_header(), .has_header(), .header_items() interface.
However, .headers pre-dates those methods, and so real code will be
using
the dictionary.
The introduction in 2.4 of those methods was a mistake for the same
reason:
code that previously saw all (urllib2 user)-provided headers in .headers
now sees only a subset (and the function interface is ugly and
incomplete).
A better change would have been to replace .headers dict with a dict
subclass (or UserDict.DictMixin instance?) that preserved the .headers
interface and also provided access to the "unredirected" headers. It's
probably too late to fix that, though.
Check .capitalize() case normalization:
>>> url = "http://example.com"
>>> Request(url, headers={"Spam-eggs": "blah"}).headers["Spam-eggs"]
'blah'
>>> Request(url, headers={"spam-EggS": "blah"}).headers["Spam-eggs"]
'blah'
Currently, Request(url, "Spam-eggs").headers["Spam-Eggs"] raises
KeyError,
but that could be changed in future.
>>>
|
|||
| msg64749 (view) | Author: John J Lee (jjlee) | Date: 2008-03-30 15:00 | |
Specifically, these improvements could be made: * the headers actually sent to httplib could be normalized to Standard-Http-Case by urllib2 * the urllib2.Request.headers interface could support case-insensitive key lookup |
|||
| msg64760 (view) | Author: Senthil (orsenthil) | Date: 2008-03-30 20:06 | |
Hi John, Greetings! I agree with both of your suggestions. Attached is the patch which aims to implement both in one go. Please provide your comments on that. If this method is okay, I shall go ahead with patches for tests and attach it also. Thanks, Senthil |
|||
| msg64761 (view) | Author: Hans-Peter Jansen (frispete) | Date: 2008-03-30 20:28 | |
Hi Senthil, that looks promising, and the title() trick is nice, as it fixes my issue.. Thanks, Pete |
|||
| msg66311 (view) | Author: Martin McNickle (BitTorment) | Date: 2008-05-06 12:21 | |
This looks good. I would suggest that the unredirected_hdrs would use the CaseInsensitiveDict too. There is still the problem (from the test documentation) that accessing .headers directly will only show a subset of headers i.e. it won't show any headers from .unredirected_hdrs. Have you any suggestions as to how they both can be accessed from the same interface? The test documentation also says: "Note the case normalization of header names here, to .capitalize()-case. This should be preserved for backwards-compatibility. (In the HTTP case, normalization to .title()-case is done by urllib2 before sending headers to httplib)." It suggests that capitalize() should be kept for backwards compatibility. I have tested and the headers actually sent to the server are in title()-case even though they are stored in the Request object as captitalize()-case. This would initially suggest that the case-normalization should be removed from the patch. However, as .headers would now be using a case-insensitive dictionary, this would still ensure backwards compatibily. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2008-05-06 12:21:45 | BitTorment | set | nosy:
+ BitTorment messages: + msg66311 |
| 2008-03-30 20:28:36 | frispete | set | messages: + msg64761 |
| 2008-03-30 20:09:42 | orsenthil | set | files: + issue2275.patch |
| 2008-03-30 20:09:22 | orsenthil | set | files: - issue2275.patch |
| 2008-03-30 20:06:39 | orsenthil | set | files:
+ issue2275.patch messages: + msg64760 |
| 2008-03-30 15:00:25 | jjlee | set | messages: + msg64749 |
| 2008-03-30 14:56:50 | jjlee | set | nosy:
+ jjlee messages: + msg64748 |
| 2008-03-30 12:12:27 | frispete | set | messages: + msg64747 |
| 2008-03-30 04:43:46 | orsenthil | set | nosy:
+ orsenthil messages: + msg64740 |
| 2008-03-11 21:22:19 | frispete | create | |