Created on 2007-11-07 21:42 by andresriancho, last changed 2008-12-02 22:26 by jjlee. This issue is now closed.
|msg57223 - (view)||Author: Andres Riancho (andresriancho)||Date: 2007-11-07 21:42|
There is an error in urllib2 when doing a POST request to a URI that responds with a 302 redirection. The problem is in urllib2.py:536, where the HTTPRedirectHandler creates the new Request based on the original one: newurl = newurl.replace(' ', '%20') return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True) The issue is that when it creates the new request, it uses the old headers (which contain a content-length header, remember that we originally sent a POST!) but doesn't use the same post-data from the original request (in fact it doesn't use any post-data). So, when the new request is sent, urllib2 sends something like: ====START Request===== GET http://f00/1.php HTTP/1.1 Content-length: 63 Accept-encoding: identity Accept: */* User-agent: w3af.sourceforge.net Host: f00 Content-type: application/x-www-form-urlencoded ==== END REQUEST === The server waits some time for the post-data that is advertised in "Content-length: 63" but it never arrives, so the connection is closed and urllib2 timeouts. There are two different solutions to this issue, implementing one is enough to solve it: 1) when creating the new request, remove the content length header 2) when creating the new request, add the post-data of the old request I think that the solution 1) is the most RFC-compliant solution. I coded a small patch for urllib2.py of python2.5 that solves this issue, the patch simply adds a line that removes the cl header: newurl = newurl.replace(' ', '%20') req.headers.pop('content-length') return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True)
|msg57227 - (view)||Author: Facundo Batista (facundobatista) *||Date: 2007-11-07 22:54|
I don't understand why after receiving a redirection, and going to a new URL, you say to NOT send the POST data. Shouldn't the HTTP request be exactly like the original one, but to another URL destination? But you said that #2 solution was more RFC compliant... Could you please quote the RFC part that describes this behaviour?
|msg57232 - (view)||Author: Senthil Kumaran (orsenthil) *||Date: 2007-11-08 04:07|
I agree with facundobatista here. 1) I am not sure, if what you mention that doing POST in the 302 redirect url does not carry forward the POST data. Some examples would help to verify that and if this is case, IMO its a bug in urllib2. 2) You mention solution 1 as RFC compliant? Does RFC mention about changes to request headers for a redirection? Please quote the relevant section also I find in RFC2616 under 302 Found that: <quote> Note: RFC 1945 and RFC 2068 specify that the client is not allowedto change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method.</quote> But could not find any references for behaviour with respect to POST on redirection.
|msg57233 - (view)||Author: Andres Riancho (andresriancho)||Date: 2007-11-08 12:52|
As mentioned in the RFC, and quoted by orsenthil, "however, most existing user agent implementations treat 302 as if it were a 303 response", which is true for urllib2.py too ( see line 585 ): http_error_301 = http_error_303 = http_error_307 = http_error_302 Which means: "all redirections are treated the same way". So, if we want to strictly respect the RFC, we should implement 4 different methods: - http_error_301 - http_error_303 - http_error_307 - http_error_302 But urllib2 is implementing the RFC "the easy way", this is: "threat all redirections the same, threat them as 303". 303 redirections perform a GET on the URI, which urllib2 does here: return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True) These line does not clone the old request completely, it creates a GET request. If we create a GET request (handling 302 as 303) we should remove the content length header!
|msg57267 - (view)||Author: Facundo Batista (facundobatista) *||Date: 2007-11-08 17:40|
So, for 302 error we should resend the request as POST (header with lenght and data), and for the others we need to keep current behaviour. Actually, we just need to code a new handling function for 302, and leave the existing one as is. What do you think?
|msg57268 - (view)||Author: Andres Riancho (andresriancho)||Date: 2007-11-08 17:56|
According to the RFC: If urllib2 gets a 302 in response to a request, it MUST send the *same* request to the URI specified in the Location header, without modifying the method, headers, or any data (urllib2 is not RFC compliant here) In urllib2, a 301 and a 307 message should be handled just like a 302. If urllib2 gets a 303 in response to a request, it MUST send a GET request to the URI specified in the Location header (urllib2 is "half" RFC compliant here, this only works if the original request WASN'T a POST request) I will code a complete patch to make urllib2 work as RFC Compliant as possible. Whenever it's ready i'll send it.
|msg57305 - (view)||Author: Senthil Kumaran (orsenthil) *||Date: 2007-11-09 15:02|
Hello Andres, I think we are mixing up 2 or 3 things. Here are my comments. 1) urllib2 POST issue. - Discussion broke on this. No conclusion. 2) GET Request. >> If we create a GET request (handling 302 as 303) we should >> remove the content length header! I fail to find a reference for this statement in the RFC. Currently urllib2, gets the headers from the original request and passes the same headers to the redirected url. RFC mentions that we "should not change the headers". I quoted the section previously. 3) Handling 30* headers the same way. Yes, urllib2 handles them in the way stating most of the clients apparently handle it the same way. Most of us are okay with it. ("Practicality beats Purity"). But when we find that something can be improved, we just to add/extend the same. I am working on urilib (sandbox/trunk/urilib) which has some extensions like cached redirection for temporary redirections etc. And for this issue, we just have to find the supporting evidence for: >> If we create a GET request (handling 302 as 303) we should >> remove the content length header! If we don't find it,then it is not a bug.
|msg57306 - (view)||Author: Andres Riancho (andresriancho)||Date: 2007-11-09 15:10|
As I said in my original bug report, if you don't remove the content-length header or add the data, you are sending an invalid request: ====START Request===== GET http://f00/1.php HTTP/1.1 Content-length: 63 Accept-encoding: identity Accept: */* User-agent: w3af.sourceforge.net Host: f00 Content-type: application/x-www-form-urlencoded ==== END REQUEST === I opened this bug report because this is a bug, not because i'm a RFC purist. I I have time, I'll test the urllib2 patch I coded yesterday and submit it for revision.
|msg57497 - (view)||Author: Jim Jewett (jimjjewett)||Date: 2007-11-14 18:42|
> But you said that #2 solution was more RFC compliant... > Could you please quote the RFC part that describes this behaviour? RFD2616 http://www.faqs.org/rfcs/rfc2616.html section 4.3 Message Body ... The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. [I couldn't actually find a quote saying that GET has no body, but ... it doesn't.] Section 10.3 Redirection 3xx says The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. In other words, changing it to GET may not be quite pure, but leaving it as POST would technically mean that the user MUST confirm that the redirect is OK. This MUST NOT becomes more explicit later, such as in 10.3.2 (301 Moved Permanently). Section 10.3.3 (302 Found) says that 307 was added specifically to insist on keeping it a POST, and even 307 says it MUST NOT automatically redirect unless it can be confirmed by the user. Which is why user agents change redirects to a GET and try that...
|msg60054 - (view)||Author: Antoine Pitrou (pitrou) *||Date: 2008-01-17 20:24|
Hi, Sending a 302 in response to a POST is a very common practice so that the browser is redirected to a "normal", non state-changing page after the POST request has been processed. It is useful in that it allows the user to reload the resulting page (fetched with GET) without it popping up a warning dialog, and without re-submitting the request if the user validates the dialog. So, a 302 response after a POST should generate a GET to the new URL. And, of course, without a Content-Length (there's no content in a GET so it can't have a length, does it?).
|msg62147 - (view)||Author: Facundo Batista (facundobatista) *||Date: 2008-02-07 12:52|
So, the general agreement is that: - After receiven the 302, it's ok to generate a GET request. - There's a problem with the generated GET request (there should not be a Content-Length field in the headers) Right? If this is ok, I'll fix it. But, what other fields should disappear? What about Content-Type, for example? Thank you!
|msg62154 - (view)||Author: Antoine Pitrou (pitrou) *||Date: 2008-02-07 15:55|
Content-Type is probably meaningless but harmless as well. I'd say the priority is in getting rid of headers whose misinterpretation can be annoying, such as Content-Length.
|msg62171 - (view)||Author: Facundo Batista (facundobatista) *||Date: 2008-02-07 19:10|
Fixed in r60648, in the trunk. Now the content-length and content-type headers are removed from from the headers in the redirection. Thank you all!
|msg76797 - (view)||Author: John J Lee (jjlee)||Date: 2008-12-02 22:26|
I think this was actually not a bug, and the fix should not have been applied. I guess this comment is just "for the record", as the fix is probably cruft that can't be removed now, since people will have started relying on it. The only way that the Content-Length header could be in req.headers in the first place is if the user had explicitly requested that it be added (urllib2 adds that header to .unredirected_hdrs, but not to .headers). The user can no doubt provoke all kinds of other errors by adding random HTTP headers -- what makes this particular one special? .add_header() has always been a "you need to know what you're doing, or you'll break stuff" interface, and it's not really possible to "fix" that (at least not without changing the meaning of .add_header()).
messages: + msg76797
|2008-02-07 19:10:03||facundobatista||set||status: open -> closed|
messages: + msg62171
|2008-02-07 15:55:44||pitrou||set||messages: + msg62154|
|2008-02-07 12:52:20||facundobatista||set||messages: + msg62147|
|2008-01-19 23:41:24||jtonsing||set||nosy: + jtonsing|
messages: + msg60054
messages: + msg57497
components: + XML, - None
|2007-11-09 15:10:06||andresriancho||set||messages: + msg57306|
|2007-11-09 15:02:11||orsenthil||set||messages: + msg57305|
|2007-11-08 17:56:11||andresriancho||set||messages: + msg57268|
|2007-11-08 17:40:40||facundobatista||set||messages: + msg57267|
|2007-11-08 12:52:17||andresriancho||set||messages: + msg57233|
messages: + msg57232
messages: + msg57227
|2007-11-07 21:44:35||andresriancho||set||title: urllib 302 POST -> urllib2 302 POST|