classification
Title: urllib2 add_header fails with existing unredirected_header
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 2.7, Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: BitTorment, facundobatista, jjlee, orsenthil, volodyaa, zathras
Priority: normal Keywords: easy, patch

Created on 2008-05-04 15:11 by zathras, last changed 2013-04-07 10:54 by volodyaa.

Files
File name Uploaded Description Edit
add_header_complete.patch BitTorment, 2008-05-11 07:51 Complete patch with documentation. Typo fixed. review
issue2756-facundo.diff facundobatista, 2008-08-16 16:47 Fixes the header grabbing in do_header() review
1.py volodyaa, 2013-04-07 10:41 Test code
s.py volodyaa, 2013-04-07 10:45 Test HTTP Server
2.py volodyaa, 2013-04-07 10:54 'Content-length' issue
Messages (13)
msg66213 - (view) Author: david reid (zathras) Date: 2008-05-04 15:11
In urllib2 when using reusing a Request calling add_header doesn't work
when an unredirected_header has been added. 

A good example (and the one that caught me out) is content-type. When
making a POST request with no content-type set the current code sets the
content-type as an unredirected_header to
'application/x-www-form-urlencoded' (line 1034 of urllib2.py) but in a
subsequent request, setting the content type via add_header will see
this ignored as unredirected_headers are appended after the headers.

A possible solution is to check whether the header being added already
exists in the requests undredirected_headers and remove it if it does.
msg66263 - (view) Author: Martin McNickle (BitTorment) Date: 2008-05-05 09:45
I can see that there will be a problem in this case.

However, it may be that the authors found it more intuitive to always
set the Content-Type to application/x-www-form-urlencoded when data is set.

Their implementation is inconsistent though:

#-------------------------------------------------------
request = urllib2.Request('http://www.somesite.com/')
request.add_data(data)
f = urllib2.urlopen(request)

# 'Content-Type: application/x-www-form-urlencoded' is sent in the header
#-------------------------------------------------------

whereas:

#-------------------------------------------------------
request = urllib2.Request('http://www.somesite.com/')
request.add_header('Content-Type', 'application/xml')
request.add_data(data)

# 'Content-Type: application/xml' is sent in the header
#-------------------------------------------------------

We need to decide what is the best way to handle this case.  Should the
normal headers be given preference, or should the unredirected headers
be given preference?

--- Martin
msg66264 - (view) Author: Martin McNickle (BitTorment) Date: 2008-05-05 09:50
Sorry, the first example should read:

#-------------------------------------------------------
request = urllib2.Request('http://www.whompbox.com/headertest.php')
request.add_data(data)
f = urllib2.urlopen(request)

request.add_header('Content-Type', 'application/xml')
f = urllib2.urlopen(request)

# 'Content-Type: application/x-www-form-urlencoded' is sent in both
headers.  Second header (should?) be 'application/xml'
#-------------------------------------------------------
msg66309 - (view) Author: Martin McNickle (BitTorment) Date: 2008-05-06 11:05
I decided that the user should have full control over the headers sent.
 Any call to add_header() or add_redirected_header() will now check if a
header with that same name has been set in the Request before (in both
the header and the unredirected header dictionaries), and if so
overwrite it.

The tests run fine with the patch installed, and also included is a
patch for the urllib2 documentation.
msg69212 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-07-03 17:39
BitTorment, what would increase the possibility of accepting this is a
patch also for the test suite (test_urllib2.py), checking this
behaviour, for us to be sure that does not break again in the future.

Could you please submit also this?
msg71111 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-08-14 07:29
The submitted patch has problems. It does not correctly solve this issue
(which I want to confirm, if there is issue at all after understanding
the logic behind unredirected_headers). My explanation of this issue and
comments on the patch is here:
http://urllib-gsoc.blogspot.com/2008/08/issue2756-urllib2-addheader-fails-with.html

Now, coming back to the current issue. We see that addition of
unredirected_hdrs takes place in the do_request_ call of
AbstractHTTPHandler and it adds the unredirected_hdrs based on certain
conditions, like when Content-Type is not there in header add the
unredirected header ('Content-Type','application/x-www-form-urlencoded')
The value of Content-Type is hardcoded here, but other header values are
not hardcoded and got from header request only.

Question here is: When the request contains the Content-Type header and
has a updated value, why is it not supposed to change the
unredirected_header to the updated value?  (Same for other request
header items).

John J Lee, can perhaps help us understand more.

If it is supposed to change, then the following snippet (rough) at
do_request_
for key, value in request.headers:
      request.add_unredirected_header(key,request.get_header(key))
should update it, there is no need to change the add_header and
add_unredirected_header method as proposed by the patch.

On our conclusion, I shall provide the updated patch (if required).

Thanks,
Senthil
msg71112 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-08-14 07:35
The problem with the patch was:

The attached patch modifies the add_header() and
add_unredirected_header() method to remove the existing headers of the
same name alternately in the headers and unredirected_hdrs.

What we observe is unredirected_hdrs item is removed during add_header()
calland it is never added back/updated in teh undirected_hdrs.

Let us discuss on the points mentioned in my previous post.
msg71227 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-08-16 16:47
What I think is that the AbstractHTTPHandler is grabbing the headers, in
the do_open() method, in an incorrect way.

Check the get_header() method from Request:

def get_header(self, header_name, default=None):
    return self.headers.get(
        header_name,
        self.unredirected_hdrs.get(header_name, default))

What it's doing there is to grab the header from self.header, and if not
there use the one in self.unredirected_hdrs, and if not there return the
default.

So, to emulate this behaviour, in do_open() I just grabbed first the
unredirected headers, and then updated it with the normal ones.

See my simple patch I attach here, which solves the issue, and passes
all the tests also.
msg75207 - (view) Author: John J Lee (jjlee) Date: 2008-10-25 12:40
I agree there is a bug here: clearly the methods on the Request class
are inconsistent with AbstractHTTPHandler.do_open() .  I think Facundo's
patch is good, though it needs a test.

The general principle when fixing earlier bugs has been that the
.add_*header() methods override default headers, though there are some
reasonable violations this where the implementation relies on specific
headers being sent -- e.g. "Connection: close".
msg132246 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2011-03-26 15:48
Senthil, I'm assigning this issue to you because you know more about this subject than me. Feel free to unassign if will not be working in it.

Thanks!
msg186188 - (view) Author: Volodymyr Antonevych (volodyaa) * Date: 2013-04-07 10:41
Test urllib2 file
msg186189 - (view) Author: Volodymyr Antonevych (volodyaa) * Date: 2013-04-07 10:45
Test HTTP server
msg186191 - (view) Author: Volodymyr Antonevych (volodyaa) * Date: 2013-04-07 10:54
Tested on 2.7. The same bug exists:

127.0.0.1 - - [07/Apr/2013 13:17:39] "POST / HTTP/1.1" 200 -
('content-length', '11')
('accept-encoding', 'identity')
('connection', 'close')
('user-agent', 'Python-urllib/2.7')
('host', '127.0.0.1:8000')
('content-type', 'application/x-www-form-urlencoded')

Also, the workaround exists for that issue:
instead of using request.add_header have to use request.add_unredirected_header for the second call.

BTW, the same issue with 'Content-length' header. If use request.add_data the second time when the content length value leaves the same as for first request.

See steps to reproduce in attachments.
History
Date User Action Args
2013-04-07 10:54:16volodyaasetfiles: + 2.py

messages: + msg186191
versions: + Python 2.7
2013-04-07 10:45:17volodyaasetfiles: + s.py

messages: + msg186189
2013-04-07 10:41:23volodyaasetfiles: + 1.py
nosy: + volodyaa
messages: + msg186188

2011-03-26 15:48:52facundobatistasetassignee: facundobatista -> orsenthil
messages: + msg132246
2009-04-22 17:27:17ajaksu2setpriority: normal
keywords: + easy
2009-02-13 01:19:48ajaksu2setstage: test needed
versions: + Python 2.6, - Python 2.5
2008-10-25 12:40:08jjleesetnosy: + jjlee
messages: + msg75207
2008-08-16 16:47:17facundobatistasetfiles: + issue2756-facundo.diff
messages: + msg71227
2008-08-14 07:35:29orsenthilsetmessages: + msg71112
2008-08-14 07:29:36orsenthilsetmessages: + msg71111
2008-07-03 17:39:55facundobatistasetassignee: facundobatista
messages: + msg69212
nosy: + orsenthil, facundobatista
2008-05-11 07:51:56BitTormentsetfiles: + add_header_complete.patch
2008-05-11 07:50:54BitTormentsetfiles: - add_header_complete.patch
2008-05-06 11:11:17BitTormentsetfiles: + add_header_complete.patch
2008-05-06 11:10:26BitTormentsetfiles: - add_header_complete.patch
2008-05-06 11:05:56BitTormentsetfiles: + add_header_complete.patch
keywords: + patch
messages: + msg66309
2008-05-05 09:50:57BitTormentsetmessages: + msg66264
2008-05-05 09:45:08BitTormentsetnosy: + BitTorment
messages: + msg66263
2008-05-04 15:11:37zathrascreate