classification
Title: urllib2 302 POST
Type: behavior Stage:
Components: Windows Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: andresriancho, facundobatista, jimjjewett, jjlee, jtonsing, orsenthil, paul.moore, pitrou, steve.dower, tim.golden, zach.ware, zhangruixia0108
Priority: normal Keywords:

Created on 2007-11-07 21:42 by andresriancho, last changed 2015-09-23 08:50 by zhangruixia0108. This issue is now closed.

Messages (15)
msg57223 - (view) Author: Andres Riancho (andresriancho) Date: 2007-11-07 21:42
There is an error in urllib2 when doing a POST request to a URI that
responds with a 302 redirection. The problem is in urllib2.py:536, where
the HTTPRedirectHandler creates the new Request based on the original one:

            newurl = newurl.replace(' ', '%20')
            return Request(newurl,
                           headers=req.headers,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)


The issue is that when it creates the new request, it uses the old
headers (which contain a content-length header, remember that we
originally sent a POST!) but doesn't use the same post-data from the
original request (in fact it doesn't use any post-data). So, when the
new request is sent, urllib2 sends something like:

====START Request=====
GET http://f00/1.php HTTP/1.1
Content-length: 63
Accept-encoding: identity
Accept: */*
User-agent: w3af.sourceforge.net
Host: f00
Content-type: application/x-www-form-urlencoded


==== END REQUEST ===

The server waits some time for the post-data that is advertised in
"Content-length: 63" but it never arrives, so the connection is closed
and urllib2 timeouts.

There are two different solutions to this issue, implementing one is
enough to solve it:
1) when creating the new request, remove the content length header
2) when creating the new request, add the post-data of the old request

I think that the solution 1) is the most RFC-compliant solution. I coded
a small patch for urllib2.py of python2.5 that solves this issue, the
patch simply adds a line that removes the cl header:

            newurl = newurl.replace(' ', '%20')
            req.headers.pop('content-length')
            return Request(newurl,
                           headers=req.headers,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)
msg57227 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2007-11-07 22:54
I don't understand why after receiving a redirection, and going to a new
URL, you say to NOT send the POST data. Shouldn't the HTTP request be
exactly like the original one, but to another URL destination?

But you said that #2 solution was more RFC compliant... Could you please
quote the RFC part that describes this behaviour?
msg57232 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-11-08 04:07
I agree with facundobatista here.
1) I am not sure, if what you mention that doing POST in the 302
redirect url does not carry forward the POST data. Some examples would
help to verify that and if this is case, IMO its a bug in urllib2.
2) You mention solution 1 as RFC compliant? Does RFC mention about
changes to request headers for a redirection? Please quote the relevant
section
also I find in RFC2616 under 302 Found that:
<quote>      Note: RFC 1945 and RFC 2068 specify that the client is not
allowedto change the method on the redirected request. However, most   
   existing user agent implementations treat 302 as if it were a 303   
   response, performing a GET on the Location field-value regardless   
   of the original request method.</quote>
But could not find any references for behaviour with respect to POST on
redirection.
msg57233 - (view) Author: Andres Riancho (andresriancho) Date: 2007-11-08 12:52
As mentioned in the RFC, and quoted by orsenthil, "however, most
existing user agent implementations treat 302 as if it were a 303
response", which is true for urllib2.py too ( see line 585 ):

    http_error_301 = http_error_303 = http_error_307 = http_error_302

Which means: "all redirections are treated the same way". So, if we want
to strictly respect the RFC, we should implement 4 different methods:

   - http_error_301
   - http_error_303
   - http_error_307
   - http_error_302

But urllib2 is implementing the RFC "the easy way", this is: "threat all
redirections the same, threat them as 303". 303 redirections perform a
GET on the URI, which urllib2 does here:

            return Request(newurl,
                           headers=req.headers,
                           origin_req_host=req.get_origin_req_host(),
                           unverifiable=True)

These line does not clone the old request completely, it creates a GET
request. If we create a GET request (handling 302 as 303) we should
remove the content length header!
msg57267 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2007-11-08 17:40
So, for 302 error we should resend the request as POST (header with
lenght and data), and for the others we need to keep current behaviour.

Actually, we just need to code a new handling function for 302, and
leave the existing one as is.

What do you think?
msg57268 - (view) Author: Andres Riancho (andresriancho) Date: 2007-11-08 17:56
According to the RFC:

If urllib2 gets a 302 in response to a request, it MUST send the *same*
request to the URI specified in the Location header, without modifying
the method, headers, or any data (urllib2 is not RFC compliant here)

In urllib2, a 301 and a 307 message should be handled just like a 302.

If urllib2 gets a 303 in response to a request, it MUST send a GET
request to the URI specified in the Location header (urllib2 is "half"
RFC compliant here, this only works if the original request WASN'T a
POST request)

I will code a complete patch to make urllib2 work as RFC Compliant as
possible. Whenever it's ready i'll send it.
msg57305 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2007-11-09 15:02
Hello Andres,
I think we are mixing up 2 or 3 things. Here are my comments.

1) urllib2 POST issue.
- Discussion broke on this. No conclusion.

2) GET Request.

>> If we create a GET request (handling 302 as 303) we should
>> remove the content length header!

I fail to find a reference for this statement in the RFC.

Currently urllib2, gets the headers from the original request and passes
the same headers to the redirected url. RFC mentions that we "should not
change the headers". I quoted the section previously.

3) Handling 30* headers the same way.
Yes, urllib2 handles them in the way stating most of the clients
apparently handle it the same way. Most of us are okay with it.
("Practicality beats Purity"). But when we find that something can be
improved, we just to add/extend the same. I am working on urilib
(sandbox/trunk/urilib) which has some extensions like cached redirection
for temporary redirections etc.

And for this issue, we just have to find the supporting evidence for:

>> If we create a GET request (handling 302 as 303) we should
>> remove the content length header!

If we don't find it,then it is not a bug.
msg57306 - (view) Author: Andres Riancho (andresriancho) Date: 2007-11-09 15:10
As I said in my original bug report, if you don't remove the
content-length header or add the data, you are sending an invalid request:

====START Request=====
GET http://f00/1.php HTTP/1.1
Content-length: 63
Accept-encoding: identity
Accept: */*
User-agent: w3af.sourceforge.net
Host: f00
Content-type: application/x-www-form-urlencoded


==== END REQUEST ===

I opened this bug report because this is a bug, not because i'm a RFC
purist. I I have time, I'll test the urllib2 patch I coded yesterday and
submit it for revision.
msg57497 - (view) Author: Jim Jewett (jimjjewett) Date: 2007-11-14 18:42
> But you said that #2 solution was more RFC compliant... 
> Could you please quote the RFC part that describes this behaviour?

RFD2616
http://www.faqs.org/rfcs/rfc2616.html

section 4.3 Message Body ...

   The presence of a message-body in a request is signaled by the
   inclusion of a Content-Length or Transfer-Encoding header field in
   the request's message-headers. A message-body MUST NOT be included in
   a request if the specification of the request method (section 5.1.1)
   does not allow sending an entity-body in requests.

[I couldn't actually find a quote saying that GET has no body, but ... it 
doesn't.]

Section 10.3 Redirection 3xx says 

   The action
   required MAY be carried out by the user agent without interaction
   with the user if and only if the method used in the second request is
   GET or HEAD.

In other words, changing it to GET may not be quite pure, but leaving it as 
POST would technically mean that the user MUST confirm that the redirect is 
OK.  This MUST NOT becomes more explicit later, such as in 10.3.2 (301 Moved 
Permanently).  Section 10.3.3 (302 Found) says that 307 was added 
specifically to insist on keeping it a POST, and even 307 says it MUST NOT 
automatically redirect unless it can be confirmed by the user.

Which is why user agents change redirects to a GET and try that...
msg60054 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-17 20:24
Hi,

Sending a 302 in response to a POST is a very common practice so that
the browser is redirected to a "normal", non state-changing page after
the POST request has been processed. It is useful in that it allows the
user to reload the resulting page (fetched with GET) without it popping
up a warning dialog, and without re-submitting the request if the user
validates the dialog.

So, a 302 response after a POST should generate a GET to the new URL.
And, of course, without a Content-Length (there's no content in a GET so
it can't have a length, does it?).
msg62147 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-02-07 12:52
So, the general agreement is that:

- After receiven the 302, it's ok to generate a GET request.

- There's a problem with the generated GET request (there should not be
a Content-Length field in the headers)

Right?

If this is ok, I'll fix it. But, what other fields should disappear?
What about Content-Type, for example?

Thank you!
msg62154 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-02-07 15:55
Content-Type is probably meaningless but harmless as well. I'd say the
priority is in getting rid of headers whose misinterpretation can be
annoying, such as Content-Length.
msg62171 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-02-07 19:10
Fixed in r60648, in the trunk.

Now the content-length and content-type headers are removed from from
the headers in the redirection.

Thank you all!
msg76797 - (view) Author: John J Lee (jjlee) Date: 2008-12-02 22:26
I think this was actually not a bug, and the fix should not have been
applied.  I guess this comment is just "for the record", as the fix is
probably cruft that can't be removed now, since people will have started
relying on it.

The only way that the Content-Length header could be in req.headers in
the first place is if the user had explicitly requested that it be added
(urllib2 adds that header to .unredirected_hdrs, but not to .headers). 
The user can no doubt provoke all kinds of other errors by adding random
HTTP headers -- what makes this particular one special?  .add_header()
has always been a "you need to know what you're doing, or you'll break
stuff" interface, and it's not really possible to "fix" that (at least
not without changing the meaning of .add_header()).
msg251417 - (view) Author: RuixiaZhang (zhangruixia0108) Date: 2015-09-23 08:50
I am using  python 2.7,the same ,come to 302 lost cookies,
I try add the code :
req.headers.pop('content-length')
to urllib2.py:536
but, have another error:
KeyError: 'content-length'
So, I want to ask is there any better method to solver this error?
History
Date User Action Args
2015-09-23 08:50:02zhangruixia0108setversions: + Python 2.7, - Python 2.5
nosy: + paul.moore, tim.golden, zhangruixia0108, zach.ware, steve.dower

messages: + msg251417

components: + Windows, - XML
2008-12-02 22:26:06jjleesetnosy: + jjlee
messages: + msg76797
2008-02-07 19:10:03facundobatistasetstatus: open -> closed
resolution: fixed
messages: + msg62171
2008-02-07 15:55:44pitrousetmessages: + msg62154
2008-02-07 12:52:20facundobatistasetmessages: + msg62147
2008-01-19 23:41:24jtonsingsetnosy: + jtonsing
2008-01-17 20:24:28pitrousetnosy: + pitrou
messages: + msg60054
2007-11-14 18:42:22jimjjewettsetnosy: + jimjjewett
messages: + msg57497
components: + XML, - None
2007-11-09 15:10:06andresrianchosetmessages: + msg57306
2007-11-09 15:02:11orsenthilsetmessages: + msg57305
2007-11-08 17:56:11andresrianchosetmessages: + msg57268
2007-11-08 17:40:40facundobatistasetmessages: + msg57267
2007-11-08 12:52:17andresrianchosetmessages: + msg57233
2007-11-08 04:07:01orsenthilsetnosy: + orsenthil
messages: + msg57232
2007-11-07 22:54:30facundobatistasetnosy: + facundobatista
messages: + msg57227
2007-11-07 21:44:35andresrianchosettitle: urllib 302 POST -> urllib2 302 POST
2007-11-07 21:42:59andresrianchocreate