Issue 1424148: urllib.FancyURLopener.redirect_internal looses data on POST!

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/42869

classification

Title:	urllib.FancyURLopener.redirect_internal looses data on POST!
Type:	behavior	Stage:	test needed
Components:	Library (Lib)	Versions:	Python 2.6, Python 2.5

process

Status:	closed	Resolution:	not a bug
Dependencies:	549151	Superseder:
Assigned To:	orsenthil	Nosy List:	ajaksu2, akuchling, crocowhile, jimjjewett, jjlee, kxroberto, orsenthil
Priority:	low	Keywords:	easy

Created on 2006-02-04 17:35 by kxroberto, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (16)
msg27426 - (view)	Author: kxroberto (kxroberto)	Date: 2006-02-04 17:35
def redirect_internal(self, url, fp, errcode, errmsg, headers, data): if 'location' in headers: newurl = headers['location'] elif 'uri' in headers: newurl = headers['uri'] else: return void = fp.read() fp.close() # In case the server sent a relative URL, join with original: newurl = basejoin(self.type + ":" + url, newurl) return self.open(newurl) ... has to become ... def redirect_internal(self, url, fp, errcode, errmsg, headers, data): if 'location' in headers: newurl = headers['location'] elif 'uri' in headers: newurl = headers['uri'] else: return void = fp.read() fp.close() # In case the server sent a relative URL, join with original: newurl = basejoin(self.type + ":" + url, newurl) return self.open(newurl,data) ... i guess? ( ",data" added ) Robert
msg27427 - (view)	Author: kxroberto (kxroberto)	Date: 2006-02-04 20:10
Logged In: YES user_id=972995 Found http://www.faqs.org/rfcs/rfc2616.html (below). But the behaviour is still strange, and the bug even more serious: a silent redirection of a POST as GET without data is obscure for a Python language. Leads to unpredictable results. The cut half execution is not stopable and all is left to a good reaction of the server, and complex reinterpreation of the client. Python urllibX should by default yield the 30X code for a POST redirection and provide the first HTML: usually a redirection HTML stub with < a href=... That would be consistent with the RFC: the User (=Application! not Python!) can redirect under full control without generating a wrong call! In my application, a bug was long unseen because of this wrong behaviour. with 30X-stub it would have been easy to discover and understand ... urllib2 has the same bug with POST redirection. ======= 10.3.2 301 Moved Permanently The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise. The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued. Note: When automatically redirecting a POST request after receiving a 301 status code, some existing HTTP/1.0 user agents will erroneously change it into a GET request.
msg27428 - (view)	Author: John J Lee (jjlee)	Date: 2006-02-06 00:54
Logged In: YES user_id=261020 This is not a bug. See the long discussion here: http://python.org/sf/549151
msg27429 - (view)	Author: kxroberto (kxroberto)	Date: 2006-02-06 10:29
Logged In: YES user_id=972995 > http://python.org/sf/549151 the analyzation of the browsers is right. lynx is best ok to ask. But urllibX is not a browser (application) but a lib: As of now with standard urllibX error handling you cannot code a lynx. gvr's initial suggestion to raise a clear error (with redirection-link as attribute of the exception value) is best ok. Another option would be to simly yield the undirected stub HTML and leave the 30X-code (and redirection LOCATION in header). To redirect POST as GET _while_ simply loosing (!) the data (and not appending it to the GET-URL) is most bad for a lib. Transcribing smart a short formlike POST to a GET w QUERY would be so la la. Don't know if the MS & netscape's also transpose to GET with long data? ... The current behaviour is most worst of all 4. All other methods whould at least have raisen an early hint/error in my case.
msg27430 - (view)	Author: Jim Jewett (jimjjewett)	Date: 2006-02-06 17:57
Logged In: YES user_id=764593 In theory, a GET may be automatic, but a POST requires user interaction, so the user can be held accountable for the results of a POST, but not of a GET. Often, the page will respond to either; not sending the queries protects privacy in case of problems, and works more often than not. (That said, I too would prefer a raised error or a transparent repost, at least as options.)
msg27431 - (view)	Author: John J Lee (jjlee)	Date: 2006-02-06 20:24
Logged In: YES user_id=261020 First, anyone replying to this, please read this page (and the whole of this tracker note!) first: http://ppewww.ph.gla.ac.uk/~flavell/www/post-redirect.html kxroberto: you say that with standard urllibX error handling you cannot get an exception on redirected 301/302/307 POST. That's not true of urllib2, since you may override HTTPRedirectHandler.redirect_request(), which method was designed and documented for precisely that purpose. It seems sensible to have a default that does what virtually all browsers do (speaking as a long-time lynx user!). I don't know about the urllib case. It's perfectly reasonable to extend urllib (if necessary) to allow the option of raising an exception. Note that (IIRC!) urllib's exceptions do not contain the response body data, however (urllib2's HTTPErrors do contain the response body data). It would of course break backwards compatibility to start raising exceptions by default here. I don't think it's reasonable to break old code on the basis of a notional security issue when the de-facto standard web client behaviour is to do the redirect. In reality, the the only "security" value of the original prescriptive rule was as a convention to be followed by white-hat web programmers and web client implementors to help users avoid unintentionally re-submitting non-idempotent requests. Since that convention is NOT followed in the real world (lynx doesn't count as the real world ;-), I see no value in sticking rigidly to the original RFC spec -- especially when 2616 even provides 307 precisely in response to this problem. Other web client libraries, for example libwww-perl and Java HTTPClient, do the same as Python here IIRC. RFC 2616 section 10.3.4 even suggests web programmers use 302 to get the behaviour you complain about! The only doubtful case here is 301. A decision was made on the default behaviour in that case back when the tracker item I pointed you to was resolved. I think it's a mistake to change our minds again on that default behaviour. kxroberto.seek(nrBytes) assert kxroberto.readline() == """\ To redirect POST as GET _while_ simply loosing (!) the data (and not appending it to the GET-URL) is most bad for a lib.""" No. There is no value in supporting behaviour which is simply contrary to both de-facto and prescriptive standards (see final paragraph of RFC 2616 section 10.3.3: if we accept the "GET on POST redirect" rule, we must accept that the Location header is exactly the URL that should be followed). FYI, many servers return a redirect URL containing the urlencoded POST data from the original request. kxroberto: """Don't know if the MS & netscape's also transpose to GET with long data? ...""" urllib2's behaviour (and urllib's, I believe) on these issues is identical to that of IE and Firefox. jimjewett: """In theory, a GET may be automatic, but a POST requires user interaction, so the user can be held accountable for the results of a POST, but not of a GET.""" That theory has been experimentally falsified ;-)
msg27432 - (view)	Author: Jim Jewett (jimjjewett)	Date: 2006-02-06 20:52
Logged In: YES user_id=764593 Sorry, I was trying to provide a quick explanation of why we couldn't just "do the obvious thing" and repost with data. Yes, I realize that in practice, GET is used for non- idempotent actions, and POST is (though less often) done automatically. But since that is the official policy, I wouldn't want to bet too heavily against it in a courtroom -- so python defaults should be at least as conservative as both the spec and the common practice.
msg27433 - (view)	Author: John J Lee (jjlee)	Date: 2006-02-06 21:19
Logged In: YES user_id=261020 Conservative or not, I see no utility in changing the default, and several major harmful effects: old code breaks, and people have to pore over the specs to figure out why "urlopen() doesn't work".
msg62579 - (view)	Author: A.M. Kuchling (akuchling) *	Date: 2008-02-20 00:13
Can this item be closed, given jjlee's argument against changing the behaviour?
msg86314 - (view)	Author: Daniel Diniz (ajaksu2) *	Date: 2009-04-22 18:48
I agree that changing the default isn't an option. However, IMHO, having to override HTTPRedirectHandler.redirect_request or FancyURLopener.redirect_internal to get RFC compliant (albeit non-useful in 99.99% of use cases) is a bit weird. Maybe the docs should contain an example of how to be compliant?
msg91560 - (view)	Author: Giorgio (crocowhile)	Date: 2009-08-14 16:41
I am not sure where we stand with this issue. It seems to be an old one. urllib2 still claim (as of python 2.6) the following; # Strictly (according to RFC 2616), 301 or 302 in response # to a POST MUST NOT cause a redirection without confirmation # from the user (of urllib2, in this case). In practice, # essentially all clients do redirect in this case, so we # do the same. # be conciliant with URIs containing a space This is just not true, we don't do the same at all. redirect_request does not pass data along and it even changes the headers to reflect content-size, thus behaving perfectly in accordance with RFC. For those who stumbled upon this page looking for a workaround, this is how to do: create a new class inheriting from HTTPRedirectHandler and use this one instead: class AutomaticHTTPRedirectHandler(urllib2.HTTPRedirectHandler): def redirect_request(self, req, fp, code, msg, headers, newurl): """Return a Request or None in response to a redirect. The default response in redirect_request claims not to follow directives in RFC 2616 but in fact it does This class does not and makes handling 302 with POST possible """ m = req.get_method() if (code in (301, 302, 303, 307) and m in ("GET", "HEAD") or code in (301, 302, 303) and m == "POST"): newurl = newurl.replace(' ', '%20') return urllib2.Request(newurl, data=req.get_data(), headers=req.headers, origin_req_host=req.get_origin_req_host(), unverifiable=True) else: raise urllib2.HTTPError(req.get_full_url(), code, msg, headers, fp)
msg91570 - (view)	Author: John J Lee (jjlee)	Date: 2009-08-14 20:21
This issue is not a bug, and should be closed. It was discussed at length many years ago (different bug tracker ticket), and resolved. Since then the same issue seems to come up every year or so, apparently raised by people who haven't checked the issue tracker for previous discussion. Please, somebody close this issue! > It seems to be an old one. > urllib2 still claim (as of python 2.6) the following; > > # Strictly (according to RFC 2616), 301 or 302 in response > # to a POST MUST NOT cause a redirection without confirmation > # from the user (of urllib2, in this case). In practice, > # essentially all clients do redirect in this case, so we > # do the same. Note that this is NOT a statement about whether the request sent as a result of the redirect response contains the original POST data. > This is just not true, we don't do the same at all. redirect_request > does not pass data along and it even changes the headers to reflect > content-size, thus behaving perfectly in accordance with RFC. This appears to be a statement about (amongst other things) whether the request sent as a result of the redirect response contains the original POST data. So where's the connection between the comment you quote and your response to it, Giorgio? Actually, I hope you don't mind if I ask you not to answer that question, but instead to go and read, very carefully, the tracker discussion for the original fix that introduced the comment you posted (you should be able to find it by svn annotating the source, finding the appropriate commit, then looking for a reference in the commit message to a bug tracker issue ID). Once you've done that, please stop posting on this issue <0.2 wink> Sorry, I'm not normally this grumpy, but this issue just seems to keep coming back forever, because people haven't spent the time to test browser behaviour, carefully read the RFC, tracker discussion, commit messages, etc. If you have done all that and thought carefully and still think there's a bug, by all means come back, but please make sure you're extremely clear about exactly what you think is wrong, and why. Write a test case, and cite specific RFC wording. If what you think is wrong is not the same as the original issue described in the opening comment of this bug tracker ticket, please raise a new ticket rather than commenting on this one. > For those who stumbled upon this page looking for a workaround, this is > how to do: create a new class inheriting from HTTPRedirectHandler and > use this one instead: I don't know what this is a workaround for.
msg91571 - (view)	Author: Giorgio (crocowhile)	Date: 2009-08-14 20:47
>I don't know what this is a workaround for. As you can see yourself, that code does a complete redirection, taking along the post_data too which is simply not possible by default (and that is obviously a pain in the neck). I never said it was "bug" nor that the code had to be changed. I am just saying this is "a lack of a feature" that obviously many would like to see implemented - and this is probably why it "seems to come back forever".
msg91605 - (view)	Author: John J Lee (jjlee)	Date: 2009-08-15 11:58
If you have a feature request, please open a separate ticket. This one is about an alleged bug.
msg91655 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2009-08-17 02:26
I am assigning this to myself. I shall do some research on this issue + plus current standings by other clients/libraries and come out with a summary.
msg91777 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2009-08-20 14:50
I agree with John on this ticket. At the outset, this is Not a bug. And reading through the referenced ticket indicates the design decision for the behavior. In summary: <quote> This suggests to me that no automatic repeat of POST requests should ever be done, and that in the case of a 302 or 303 response, a POST should be replaced by a GET; this may also be done for a 301 response -- even though the standard calls that an error, it admits that it is done by old clients. </quote> That was Guido's point at that time. The least that could be done is take a call on 301 response, but this would break the other clients which rely on 'earlier standard behavior though not compliant with RFC'. At the moment, this wont be necessary as it just break clients using urllib. Giorgio's point in rekindling this issue, is not related to urllib module and specifically w.r.t to redirect_request implementation. So, an alternate behavior is desired on urllib2's redirects (if they are observed by existing clients), it could be handled by another request. So, effectively closing this request.

History
Date	User	Action	Args
2022-04-11 14:56:15	admin	set	github: 42869
2009-08-20 14:50:49	orsenthil	set	status: open -> closed resolution: not a bug messages: + msg91777
2009-08-17 02:26:06	orsenthil	set	assignee: orsenthil messages: + msg91655
2009-08-15 11:58:14	jjlee	set	messages: + msg91605
2009-08-14 20:47:54	crocowhile	set	messages: + msg91571
2009-08-14 20:21:30	jjlee	set	messages: + msg91570
2009-08-14 16:41:57	crocowhile	set	nosy: + crocowhile messages: + msg91560 versions: + Python 2.5, - Python 3.0
2009-04-22 18:48:22	ajaksu2	set	priority: normal -> low nosy: + ajaksu2 messages: + msg86314 keywords: + easy
2009-02-12 18:14:47	ajaksu2	set	nosy: + orsenthil dependencies: + urllib2 POSTs on redirect type: behavior stage: test needed versions: + Python 2.6, Python 3.0, - Python 2.4
2008-02-20 00:13:44	akuchling	set	nosy: + akuchling messages: + msg62579
2006-02-04 17:35:20	kxroberto	create