classification
Title: [urllib.request]: Comparison of HTTP headers should be insensitive to the case
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: 18986 Superseder: Make urllib.request.Request.has_header() etc case-insensitive
View: 2275
Assigned To: orsenthil Nosy List: BreamoreBoy, cocobear, karlcow, martin.panter, orsenthil, yselivanov
Priority: normal Keywords: patch

Created on 2009-03-24 04:48 by cocobear, last changed 2015-11-05 06:51 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
issue-5550-test-1.patch karlcow, 2013-03-20 15:03 review
issue-5550-2.patch karlcow, 2013-03-20 20:38 review
issue5550-5.patch karlcow, 2014-09-25 02:02 review
Messages (19)
msg84058 - (view) Author: cocobear (cocobear) Date: 2009-03-24 04:48
take a look at following code:
<code>
import urllib2

headers = [("Content-Type","application/oct-stream"),]
opener = urllib2.build_opener()
opener.addheaders = headers
urllib2.install_opener(opener)
print "after install_opener"

ret = opener.open('http://www.google.com',data="word=ss")
print ret.read()
</code>

I got real send data by wireshark:

POST / HTTP/1.1

Accept-Encoding: identity

Content-Length: 7

Host: www.dict.cn

Content-Type: application/x-www-form-urlencoded

Connection: close



word=ss


I had already set HTTP header of Content-Type, but actally urllib2
change this header.

I got this piece of code from urllib2.py:

       if request.has_data():  # POST
           data = request.get_data()
           if not request.has_header('Content-type'):
               request.add_unredirected_header(
                   'Content-type',
                   'application/x-www-form-urlencoded')
           if not request.has_header('Content-length'):
               request.add_unredirected_header(
                   'Content-length', '%d' % len(data))

       scheme, sel = splittype(request.get_selector())
       sel_host, sel_path = splithost(sel)
       if not request.has_header('Host'):
           request.add_unredirected_header('Host', sel_host or host)
       for name, value in self.parent.addheaders:
           name = name.capitalize()
           if not request.has_header(name):
               request.add_unredirected_header(name, value)


first,urllib2 will add Content-Type if it's POST method, then it try to
add headers which holded by opener, but there's already 'Content-Type',
so the attribute->addheaders of opener will not append.

I can use Request to add 'Content-Type' header:

 request = urllib2.Request(url,headers=headers,data=body)

The example code that I wrote doesn't correct? or it's a problem of urllib2?
msg184684 - (view) Author: karl (karlcow) * Date: 2013-03-19 21:34
The wireshark trace is a different domain than the code example. But let's see. cocobear added:

headers = [("Content-Type","application/oct-stream"),]

with a "Content-Type", not the capitalized "Type".


BUT in the source code or urllib/request.py there is
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l1184

if not request.has_header('Content-type')

And if you check has_header at
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l367

it compares exactly the string it has received, when HTTP headers are case insensitive. It reminds me of

http://bugs.python.org/issue12455
http://bugs.python.org/issue17322

with capitalization issues.

I'm changing the title to be more exact.
msg184768 - (view) Author: karl (karlcow) * Date: 2013-03-20 15:03
orsenthil,

Added test for python 3.3 for testing the case insensitivity for has_header and get_header. The tests fail as it should.
See issue-5550-test-1.patch

I will make a patch to urllib/request.py to make it work.
OK with you?
msg184790 - (view) Author: karl (karlcow) * Date: 2013-03-20 20:38
First, Sanity check for myself to be sure to understand.
==============================================
→ python3.3
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> req = urllib.request.Request('http://www.python.org/')
>>> req.headers
{}
>>> req.unredirected_hdrs
{}
>>> r = urllib.request.urlopen(req)
>>> req.headers
{}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]
>>> req.has_header('host')
False
>>> req.has_header('Host')
True
>>> req.get_header('host')
>>> req.get_header('Host')
'www.python.org'
>>> 'host' in req.unredirected_hdrs
False
>>> 'Host' in req.unredirected_hdrs
True
>>> 'host' in req.header_items()
False
>>> 'Host' in req.header_items()
False
>>> req.unredirected_hdrs.items()
dict_items([('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')])
>>> req.headers.get('Host', req.unredirected_hdrs.get('Host',None))
'www.python.org'
>>> req.headers.get('Host')
>>> req.headers.get('host'.capitalize(), req.unredirected_hdrs.get('host'.capitalize(),None))
'www.python.org'
>>> req.headers.get('HOST'.capitalize(), req.unredirected_hdrs.get('HOST'.capitalize(),None))
'www.python.org'
>>> req.headers.get('host'.title(), req.unredirected_hdrs.get('host'.title(),None))
'www.python.org'
>>> 'host'.capitalize() in req.unredirected_hdrs
True
==============================================

OK. The two add methods force the capitalization thought capitalize() (And not title() which is an issue by itself)

http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l359

    def add_header(self, key, val):
        # useful for something like authentication
        self.headers[key.capitalize()] = val

    def add_unredirected_header(self, key, val):
        # will not be added to a redirected request
        self.unredirected_hdrs[key.capitalize()] = val


HTTP headers are case insensitive. The way the methods get_header() and has_header() are currently designed. We could also capitalize the variable before requesting it.


So something like 

    def get_header(self, header_name, default=None):
        return self.headers.get(
            header_name.capitalize(),
            self.unredirected_hdrs.get(header_name.capitalize(), default))


    def has_header(self, header_name):
        return (header_name.capitalize() in self.headers or
                header_name.capitalize() in self.unredirected_hdrs)

The method to add headers on request is 

>>> req.add_header("foo-bar","booh")
>>> req.headers
{'Foo-bar': 'booh'}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('Foo-bar', 'booh'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]


So if someone add an header it will be capitalized. And the query will be correct.

The issue is more with addheader which doesn't have the same constraint.
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l1624

Personally I would have made everything case insensitive ;)

Also note that in this module the casing is not consistent when the values are hardcoded. Sometimes Content-Type, sometimes Content-type.

Anyway I submitted a patch with the code modification AND the test. And this is the result when running the test suite.

→ ./python.exe Lib/test/test_urllib2net.py 
test_sni (__main__.HTTPSTests) ... skipped 'test disabled - test server needed'
test_custom_headers (__main__.OtherNetworkTests) ... ok
test_file (__main__.OtherNetworkTests) ... ok
test_ftp (__main__.OtherNetworkTests) ... ok
test_headers_case_sensitivity (__main__.OtherNetworkTests) ... ok
test_sites_no_connection_close (__main__.OtherNetworkTests) ... /Users/karl/Documents/2011/cpython/Lib/socket.py:370: ResourceWarning: unclosed <socket.socket object, fd=5, family=2, type=1, proto=6>
  self._sock = None
ok
test_urlwithfrag (__main__.OtherNetworkTests) ... ok
test_close (__main__.CloseSocketTest) ... ok
test_ftp_basic (__main__.TimeoutTest) ... ok
test_ftp_default_timeout (__main__.TimeoutTest) ... ok
test_ftp_no_timeout (__main__.TimeoutTest) ... ok
test_ftp_timeout (__main__.TimeoutTest) ... ok
test_http_basic (__main__.TimeoutTest) ... ok
test_http_default_timeout (__main__.TimeoutTest) ... ok
test_http_no_timeout (__main__.TimeoutTest) ... ok
test_http_timeout (__main__.TimeoutTest) ... ok

----------------------------------------------------------------------
Ran 16 tests in 15.259s

OK (skipped=1)
[137983 refs]
msg184799 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-03-20 21:25
Karl - a patch is welcome. BTW, I wanted to search for an already existing and I think open issue with a patch ( probably written by me) somewhere. We have to see what's the diff of that issue with new one. I will share that issue in this id. But yeah, this is a good discussion and +1 for this fix.
msg185402 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2013-03-28 06:02
Case insensitive lookup will be more fool-proof than capitalized look up.
I would in favor of going with case insensitive look up than the capitalized look up one. Because the headers could be added in multiple ways and we cannot be assured that they are all stored as Capitalized header values.
msg185441 - (view) Author: karl (karlcow) * Date: 2013-03-28 12:24
Yes case insensitive make sense. It seems it will require a lot of modifications in the code. So I guess maybe the first step is to identify where it could break.
msg220885 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-06-17 20:45
@Karl do you intend following up on this issue?
msg221401 - (view) Author: karl (karlcow) * Date: 2014-06-24 06:46
@Mark,

yup, I can do that.
I just realized that since my contribution there was a PSF Contributor Agreement. This is signed.

I need to dive a bit again in the code to remember where things fail.
msg227348 - (view) Author: karl (karlcow) * Date: 2014-09-23 12:06
Ok this is an attempt at solving the issue with lowercase. I find my get_header a bit complicated, but if you have a better idea. :) I'll modify the patches.

I have try to run the tests on the mac here but I have an issue currently.

→ ./python.exe -V
Python 3.5.0a0

Traceback (most recent call last):
  File "./Tools/scripts/patchcheck.py", line 6, in <module>
    import subprocess
  File "/Users/karl/code/cpython/Lib/subprocess.py", line 353, in <module>
    import signal
ImportError: No module named 'signal'
make: *** [patchcheck] Error 1
msg227353 - (view) Author: karl (karlcow) * Date: 2014-09-23 12:47
And I had to do a typo in patch3. Submitting patch4. Sorry about that.
msg227411 - (view) Author: karl (karlcow) * Date: 2014-09-24 07:26
Ok my tests are ok.

→ ./python.exe -m unittest -v Lib/test/test_urllib2net.py 

test_close (Lib.test.test_urllib2net.CloseSocketTest) ... ok
test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... FAIL
test_file (Lib.test.test_urllib2net.OtherNetworkTests) ... test_ftp (Lib.test.test_urllib2net.OtherNetworkTests) ... skipped "Resource 'ftp://gatekeeper.research.compaq.com/pub/DEC/SRC/research-reports/00README-Legal-Rules-Regs' is not available"
test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_redirect_url_withfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_sites_no_connection_close (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_urlwithfrag (Lib.test.test_urllib2net.OtherNetworkTests) ... ok
test_ftp_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_ftp_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_basic (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_default_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_no_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok
test_http_timeout (Lib.test.test_urllib2net.TimeoutTest) ... ok


I wonder if test_custom_headers fails because of my modifications.
msg227418 - (view) Author: karl (karlcow) * Date: 2014-09-24 07:52
Opened issue #22478 for the tests failing. Not related to my modification.
msg227501 - (view) Author: karl (karlcow) * Date: 2014-09-25 02:02
OK after fixing my repo (Thanks orsenthil) I got the tests running properly. The inspection order of the two dictionary was not right, so I had to modify a bit the patch.

→ ./python.exe -m unittest -v 
Lib.test.test_urllib2net.OtherNetworkTests.test_headers_case_sensitivity
test_headers_case_sensitivity (Lib.test.test_urllib2net.OtherNetworkTests) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.286s

OK

→ ./python.exe -m unittest -v 
Lib.test.test_urllib2net.OtherNetworkTests.test_custom_headers
test_custom_headers (Lib.test.test_urllib2net.OtherNetworkTests) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.575s

OK


New patch issue5550-5.patch
unlinking issue5550-4.patch
msg227646 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-09-26 20:31
I left some comments in the codereview.

I think that having some half-baked solution is great, but I really would like to see a proper fix, i.e. with remove_header and other methods fixed. Ideally, you should add a UserDict implementation for headers that implements proper __XXXitem__ methods.
msg227648 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2014-09-26 20:35
Ideally, we should just wait when PEP 455 lands, so we can use TransformDict for headers.

Also, I don't think we can land a fix for this in any pythons out there, I would focus on making this right in 3.5
msg239651 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-31 00:58
According to <https://bugs.python.org/issue18986#msg236161> that PEP isn’t going to be accepted, although no reason seems to have been given. So I wouldn’t hold back because of that.
msg254050 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-04 10:57
Regarding the original bug reported by Cocobear, I think this is by design. The “addheaders” attribute seems to serve as a set of default header fields, although this is hardly obvious from the documentation. In fact the current documentation suggests you have to use the application/x-www-form-urlencoded content type for request data. I think the documentation should be relaxed; I opened Issue 23360 about that.

The rest of the thread seems to have side-tracked into patches to make Request.has_header() and friends case insensitive. I’ll have to study the threads closer, but I think this side-track is a duplicate of Issue 2275.
msg254085 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-05 06:50
The patch here does the same thing as John’s issue2775.patch in Issue 2275. I am closing this thread, because I think John’s patch in the other one is superior (except for being way out of date).
History
Date User Action Args
2015-11-05 06:51:52martin.pantersetsuperseder: Implement PEP 3108 -> Make urllib.request.Request.has_header() etc case-insensitive
2015-11-05 06:50:43martin.pantersetstatus: open -> closed
superseder: Implement PEP 3108
resolution: duplicate
messages: + msg254085
2015-11-04 10:57:34martin.pantersetmessages: + msg254050
2015-03-31 00:58:26martin.pantersetnosy: + martin.panter
dependencies: + Add a case-insensitive case-preserving dict
messages: + msg239651
2014-09-26 20:35:44yselivanovsetmessages: + msg227648
2014-09-26 20:31:13yselivanovsetnosy: + yselivanov
messages: + msg227646
2014-09-25 02:02:28karlcowsetfiles: - issue-5550-4.patch
2014-09-25 02:02:14karlcowsetfiles: + issue5550-5.patch

messages: + msg227501
2014-09-24 07:52:18karlcowsetmessages: + msg227418
2014-09-24 07:26:54karlcowsetmessages: + msg227411
2014-09-23 12:47:41karlcowsetfiles: + issue-5550-4.patch

messages: + msg227353
2014-09-23 12:46:53karlcowsetfiles: - issue-5550-3.patch
2014-09-23 12:06:37karlcowsetfiles: + issue-5550-3.patch

messages: + msg227348
2014-06-24 06:46:10karlcowsetmessages: + msg221401
2014-06-17 20:45:38BreamoreBoysetnosy: + BreamoreBoy
messages: + msg220885
2013-03-28 12:24:05karlcowsetmessages: + msg185441
2013-03-28 06:02:45orsenthilsetmessages: + msg185402
2013-03-20 21:25:53orsenthilsetmessages: + msg184799
2013-03-20 20:38:47karlcowsetfiles: + issue-5550-2.patch

messages: + msg184790
2013-03-20 15:03:02karlcowsetfiles: + issue-5550-test-1.patch
keywords: + patch
messages: + msg184768
2013-03-20 01:17:13orsenthilsetassignee: orsenthil
2013-03-19 21:34:56karlcowsetnosy: + karlcow

messages: + msg184684
title: urllib2 use of opener.addheaders -> [urllib.request]: Comparison of HTTP headers should be insensitive to the case
2012-11-09 13:26:39ezio.melottisetversions: + Python 3.3, Python 3.4, - Python 3.1
2010-07-09 17:14:13BreamoreBoysetnosy: + orsenthil
stage: test needed

versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.5
2009-03-24 04:48:59cocobearcreate