Issue 1401: urllib2 302 POST

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45742

classification

Title:	urllib2 302 POST
Type:	behavior	Stage:
Components:	Windows	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	andresriancho, facundobatista, jimjjewett, jjlee, jtonsing, orsenthil, paul.moore, pitrou, steve.dower, tim.golden, zach.ware, zhangruixia0108
Priority:	normal	Keywords:

Created on 2007-11-07 21:42 by andresriancho, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (15)
msg57223 - (view)	Author: Andres Riancho (andresriancho)	Date: 2007-11-07 21:42
There is an error in urllib2 when doing a POST request to a URI that responds with a 302 redirection. The problem is in urllib2.py:536, where the HTTPRedirectHandler creates the new Request based on the original one: newurl = newurl.replace(' ', '%20') return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True) The issue is that when it creates the new request, it uses the old headers (which contain a content-length header, remember that we originally sent a POST!) but doesn't use the same post-data from the original request (in fact it doesn't use any post-data). So, when the new request is sent, urllib2 sends something like: ====START Request===== GET http://f00/1.php HTTP/1.1 Content-length: 63 Accept-encoding: identity Accept: / User-agent: w3af.sourceforge.net Host: f00 Content-type: application/x-www-form-urlencoded ==== END REQUEST === The server waits some time for the post-data that is advertised in "Content-length: 63" but it never arrives, so the connection is closed and urllib2 timeouts. There are two different solutions to this issue, implementing one is enough to solve it: 1) when creating the new request, remove the content length header 2) when creating the new request, add the post-data of the old request I think that the solution 1) is the most RFC-compliant solution. I coded a small patch for urllib2.py of python2.5 that solves this issue, the patch simply adds a line that removes the cl header: newurl = newurl.replace(' ', '%20') req.headers.pop('content-length') return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True)
msg57227 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2007-11-07 22:54
I don't understand why after receiving a redirection, and going to a new URL, you say to NOT send the POST data. Shouldn't the HTTP request be exactly like the original one, but to another URL destination? But you said that #2 solution was more RFC compliant... Could you please quote the RFC part that describes this behaviour?
msg57232 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2007-11-08 04:07
I agree with facundobatista here. 1) I am not sure, if what you mention that doing POST in the 302 redirect url does not carry forward the POST data. Some examples would help to verify that and if this is case, IMO its a bug in urllib2. 2) You mention solution 1 as RFC compliant? Does RFC mention about changes to request headers for a redirection? Please quote the relevant section also I find in RFC2616 under 302 Found that: <quote> Note: RFC 1945 and RFC 2068 specify that the client is not allowedto change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method.</quote> But could not find any references for behaviour with respect to POST on redirection.
msg57233 - (view)	Author: Andres Riancho (andresriancho)	Date: 2007-11-08 12:52
As mentioned in the RFC, and quoted by orsenthil, "however, most existing user agent implementations treat 302 as if it were a 303 response", which is true for urllib2.py too ( see line 585 ): http_error_301 = http_error_303 = http_error_307 = http_error_302 Which means: "all redirections are treated the same way". So, if we want to strictly respect the RFC, we should implement 4 different methods: - http_error_301 - http_error_303 - http_error_307 - http_error_302 But urllib2 is implementing the RFC "the easy way", this is: "threat all redirections the same, threat them as 303". 303 redirections perform a GET on the URI, which urllib2 does here: return Request(newurl, headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True) These line does not clone the old request completely, it creates a GET request. If we create a GET request (handling 302 as 303) we should remove the content length header!
msg57267 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2007-11-08 17:40
So, for 302 error we should resend the request as POST (header with lenght and data), and for the others we need to keep current behaviour. Actually, we just need to code a new handling function for 302, and leave the existing one as is. What do you think?
msg57268 - (view)	Author: Andres Riancho (andresriancho)	Date: 2007-11-08 17:56
According to the RFC: If urllib2 gets a 302 in response to a request, it MUST send the same request to the URI specified in the Location header, without modifying the method, headers, or any data (urllib2 is not RFC compliant here) In urllib2, a 301 and a 307 message should be handled just like a 302. If urllib2 gets a 303 in response to a request, it MUST send a GET request to the URI specified in the Location header (urllib2 is "half" RFC compliant here, this only works if the original request WASN'T a POST request) I will code a complete patch to make urllib2 work as RFC Compliant as possible. Whenever it's ready i'll send it.
msg57305 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2007-11-09 15:02
Hello Andres, I think we are mixing up 2 or 3 things. Here are my comments. 1) urllib2 POST issue. - Discussion broke on this. No conclusion. 2) GET Request. >> If we create a GET request (handling 302 as 303) we should >> remove the content length header! I fail to find a reference for this statement in the RFC. Currently urllib2, gets the headers from the original request and passes the same headers to the redirected url. RFC mentions that we "should not change the headers". I quoted the section previously. 3) Handling 30* headers the same way. Yes, urllib2 handles them in the way stating most of the clients apparently handle it the same way. Most of us are okay with it. ("Practicality beats Purity"). But when we find that something can be improved, we just to add/extend the same. I am working on urilib (sandbox/trunk/urilib) which has some extensions like cached redirection for temporary redirections etc. And for this issue, we just have to find the supporting evidence for: >> If we create a GET request (handling 302 as 303) we should >> remove the content length header! If we don't find it,then it is not a bug.
msg57306 - (view)	Author: Andres Riancho (andresriancho)	Date: 2007-11-09 15:10
As I said in my original bug report, if you don't remove the content-length header or add the data, you are sending an invalid request: ====START Request===== GET http://f00/1.php HTTP/1.1 Content-length: 63 Accept-encoding: identity Accept: / User-agent: w3af.sourceforge.net Host: f00 Content-type: application/x-www-form-urlencoded ==== END REQUEST === I opened this bug report because this is a bug, not because i'm a RFC purist. I I have time, I'll test the urllib2 patch I coded yesterday and submit it for revision.
msg57497 - (view)	Author: Jim Jewett (jimjjewett)	Date: 2007-11-14 18:42
> But you said that #2 solution was more RFC compliant... > Could you please quote the RFC part that describes this behaviour? RFD2616 http://www.faqs.org/rfcs/rfc2616.html section 4.3 Message Body ... The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. [I couldn't actually find a quote saying that GET has no body, but ... it doesn't.] Section 10.3 Redirection 3xx says The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. In other words, changing it to GET may not be quite pure, but leaving it as POST would technically mean that the user MUST confirm that the redirect is OK. This MUST NOT becomes more explicit later, such as in 10.3.2 (301 Moved Permanently). Section 10.3.3 (302 Found) says that 307 was added specifically to insist on keeping it a POST, and even 307 says it MUST NOT automatically redirect unless it can be confirmed by the user. Which is why user agents change redirects to a GET and try that...
msg60054 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-17 20:24
Hi, Sending a 302 in response to a POST is a very common practice so that the browser is redirected to a "normal", non state-changing page after the POST request has been processed. It is useful in that it allows the user to reload the resulting page (fetched with GET) without it popping up a warning dialog, and without re-submitting the request if the user validates the dialog. So, a 302 response after a POST should generate a GET to the new URL. And, of course, without a Content-Length (there's no content in a GET so it can't have a length, does it?).
msg62147 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-02-07 12:52
So, the general agreement is that: - After receiven the 302, it's ok to generate a GET request. - There's a problem with the generated GET request (there should not be a Content-Length field in the headers) Right? If this is ok, I'll fix it. But, what other fields should disappear? What about Content-Type, for example? Thank you!
msg62154 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-02-07 15:55
Content-Type is probably meaningless but harmless as well. I'd say the priority is in getting rid of headers whose misinterpretation can be annoying, such as Content-Length.
msg62171 - (view)	Author: Facundo Batista (facundobatista) *	Date: 2008-02-07 19:10
Fixed in r60648, in the trunk. Now the content-length and content-type headers are removed from from the headers in the redirection. Thank you all!
msg76797 - (view)	Author: John J Lee (jjlee)	Date: 2008-12-02 22:26
I think this was actually not a bug, and the fix should not have been applied. I guess this comment is just "for the record", as the fix is probably cruft that can't be removed now, since people will have started relying on it. The only way that the Content-Length header could be in req.headers in the first place is if the user had explicitly requested that it be added (urllib2 adds that header to .unredirected_hdrs, but not to .headers). The user can no doubt provoke all kinds of other errors by adding random HTTP headers -- what makes this particular one special? .add_header() has always been a "you need to know what you're doing, or you'll break stuff" interface, and it's not really possible to "fix" that (at least not without changing the meaning of .add_header()).
msg251417 - (view)	Author: RuixiaZhang (zhangruixia0108)	Date: 2015-09-23 08:50
I am using python 2.7,the same ,come to 302 lost cookies, I try add the code : req.headers.pop('content-length') to urllib2.py:536 but, have another error: KeyError: 'content-length' So, I want to ask is there any better method to solver this error?

History
Date	User	Action	Args
2022-04-11 14:56:28	admin	set	github: 45742
2015-09-23 08:50:02	zhangruixia0108	set	versions: + Python 2.7, - Python 2.5 nosy: + paul.moore, tim.golden, zhangruixia0108, zach.ware, steve.dower messages: + msg251417 components: + Windows, - XML
2008-12-02 22:26:06	jjlee	set	nosy: + jjlee messages: + msg76797
2008-02-07 19:10:03	facundobatista	set	status: open -> closed resolution: fixed messages: + msg62171
2008-02-07 15:55:44	pitrou	set	messages: + msg62154
2008-02-07 12:52:20	facundobatista	set	messages: + msg62147
2008-01-19 23:41:24	jtonsing	set	nosy: + jtonsing
2008-01-17 20:24:28	pitrou	set	nosy: + pitrou messages: + msg60054
2007-11-14 18:42:22	jimjjewett	set	nosy: + jimjjewett messages: + msg57497 components: + XML, - None
2007-11-09 15:10:06	andresriancho	set	messages: + msg57306
2007-11-09 15:02:11	orsenthil	set	messages: + msg57305
2007-11-08 17:56:11	andresriancho	set	messages: + msg57268
2007-11-08 17:40:40	facundobatista	set	messages: + msg57267
2007-11-08 12:52:17	andresriancho	set	messages: + msg57233
2007-11-08 04:07:01	orsenthil	set	nosy: + orsenthil messages: + msg57232
2007-11-07 22:54:30	facundobatista	set	nosy: + facundobatista messages: + msg57227
2007-11-07 21:44:35	andresriancho	set	title: urllib 302 POST -> urllib2 302 POST
2007-11-07 21:42:59	andresriancho	create