classification
Title: urllib2 HTTPRedirectHandler not forwarding POST data in redirect
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.3
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: crustymonkey, eric.araujo, orsenthil
Priority: normal Keywords: patch

Created on 2012-02-27 22:44 by crustymonkey, last changed 2012-03-16 23:22 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
urllib2.py.patch crustymonkey, 2012-02-27 22:44 Patch to fix POST data not being redirected
urllib2.py.redirect_option.patch crustymonkey, 2012-03-02 19:18 Patch to make the redirecting of POST data an option
Messages (16)
msg154516 - (view) Author: Jay Deiman (crustymonkey) Date: 2012-02-27 22:44
I've noticed that urllib2's HTTPRedirectHandler does not redirect a POST request with the POST data.  

If you send a POST request to a server with data, the data is dropped when the new Request is made to the new url.  As stated in a comment in the library itself, redirecting a POST request is not strictly RFC compliant, but it's generally supported anyway.  The problem here being that our POST data is not also being redirected.  I ran into this issue when writing a web api wrapper in Python.

I'm submitting a small patch that fixes this issue:


--- /usr/lib/python2.7/urllib2.py	2011-10-04 16:07:28.000000000 -0500
+++ urllib2.py	2012-02-27 16:03:36.000000000 -0600
@@ -551,7 +551,11 @@
             newheaders = dict((k,v) for k,v in req.headers.items()
                               if k.lower() not in ("content-length", "content-type")
                              )
+            data = None
+            if req.has_data():
+                data = req.get_data()
             return Request(newurl,
+                           data=data,
                            headers=newheaders,
                            origin_req_host=req.get_origin_req_host(),
                            unverifiable=True)
msg154517 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-02-27 22:47
If I recollect correctly, redirecting a POST request with POST data is not a recommended thing to do and could be a security hazard. That's why RFC prohibits it too. It is intentional that urllib2 is not redirecting carrying along the POST data. Could you show any browsers showing this behavior of redirecting along with POST?
msg154518 - (view) Author: Jay Deiman (crustymonkey) Date: 2012-02-27 23:07
Senthil,

That is a good point about the potential for security issues.  What if it was an explicit option in HTTPRedirectHandler since there is a possibility of value in being able to do it.  I know my case is probably unusual, but I imagine that others might have run into this too.  Something roughly along this line is what I'm thinking:

----------------
class HTTPRedirectHandler(BaseHandler):
    redirect_post_data = False
    ...
    ...
    def redirect_request(self, req, fp, code, msg, headers, newurl):
        ...
        ...
        data = None
        if req.has_data() and self.redirect_post_data:
            data = req.get_data()
        return Request(newurl,
                       data=data,
                       headers=newheaders,
                       origin_req_host=req.get_origin_req_host(),
                       unverifiable=True)
----------------
That would leave the current default behavior as-is, but leave the option to explicitly override it by the user, perhaps with a BIG DISCLAIMER comment about security.
msg154726 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-02 04:36
Could you give the RFC section or URI?  I glanced at the page about status codes but did not find a prohibition on POST.
msg154730 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-02 05:15
Here is a section which talks about 3xx redirection

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html


10.3 Redirection 3xx

This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request. The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD.
msg154732 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-02 05:27
That’s clear as mud :)  If I read correctly, the text means that the user agent is free to follow redirects without asking the user if the request is GET or HEAD, and needs to ask the user if the request is e.g. POST.  That’s in line with what Firefox does when you refresh after a POST: It asks you to confirm if you want the data to be re-sent.  POST not being idempotent explains the need for this precaution.

So, I think a library like urllib should not prevent people from doing something that is allowed.  +1 to a new param to transfer the body when following a redirect under a POST.
msg154788 - (view) Author: Jay Deiman (crustymonkey) Date: 2012-03-02 19:18
Senthil,

The HTTPRedirectHandler is already breaking RFC2616 by it's own admission in the code comments (from the source):

# Strictly (according to RFC 2616), 301 or 302 in response
# to a POST MUST NOT cause a redirection without confirmation
# from the user (of urllib2, in this case).  In practice,
# essentially all clients do redirect in this case, so we
# do the same.
# be conciliant with URIs containing a space

I can definitely understand the issue with changing the default behavior to redirect the POST data.  However, an added option which leaves the current behavior as the default shouldn't hurt.  I'm submitting a new patch file (urllib2.py.redirect_option.patch), which will do exactly that.
msg154830 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-03 11:44
> However, an added option which leaves the current behavior as the default shouldn't hurt.
My opinion too.  urllib is sometimes a client, sometimes a library used to build clients, which need a knob to implement their own decisions or possibly ask the user for confirmation.

A new argument being a new feature, this patch must target 3.3.

Some comments on the patch:

+    # NOTE: Setting redirect_post_data to True *can* introduce security
+    # issues and is not recommended unless you are sure of where the 
+    # POST data is being redirected!
I would tone down this note, for example:

     # setting redirect_post_data to True can introduce security
     # issues, use with caution

+    redirect_post_data = False
Is an attribute okay or should methods (__init__, maybe methods that do the requests too) grow a new parameter?
msg154831 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-03 11:45
Also, new tests and a doc update would be needed, but you may want to wait for Senthil’s approval before doing more work on this.
msg154914 - (view) Author: Jay Deiman (crustymonkey) Date: 2012-03-04 23:40
I have no problem making doc and test changes.  I'll probably need a pointer as to where these changes need to be made and submitted to, but like you said, I'll wait until the patch is accepted before doing that.
msg154922 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-05 05:17
> I have no problem making doc and test changes.  I'll probably need a
> pointer as to where these changes need to be made
Great!  I think the right test file is Lib/test/test_urllib2.py (the test files were not renamed in Python 3 when some modules were renamed); there is a test_redirect method that should be the right place.  Senthil will confirm.

Remember to work from a 3.3 checkout (i.e. the default branch of http://hg.python.org/cpython).  More help about contributing (like where to find files) is at http://docs.python.org/devguide.

> and submitted to,
Just as a patch on this report.
msg154940 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-05 10:10
Hi Jay & Éric,

I understand your points and providing an extra argument seems like an idea that could be useful to circumvent , what you see as a problem. 

The RFC section states that - 

"The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD".

By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the redirect which is happening and is "permitting" to POST the data to new location.

The interaction happens like this:

User: Post to /a
Browser: Posts to /a and Server says oh /a is /b
Browser: Hello user! Server says /a is now /b. Shall I post to /b?
User: Yes, you may.

This is different from what you are saying, which is like with having an option in the browser settings which will enable following redirect on POST.

User: Post to /a (and if there is redirect follow that post to the corresponding site).
Browser: Posts to /a and Server says /a is /b. 
Browser: Posts to /b

But do you know if any such browser setting exist? No. Browsers for good reasons do not provide such a setting and they prompt user if they want to follow the redirect with POST.

In a similar way, developers using urllib as library in their applications can obtain the redirected URL and then POST to the redirected URL. That would be equivalent behavior. 

Providing an automatic follow redirect on POST could serious security issue, both for clients/libraries and browser.  Even with a word of caution, it has a high chance of being misused. So, I am -1 on this proposal.

I hope you understand my argument. I had thought about this earlier a for a similar issue and I remember we made the decision to drop the data following the redirected POST. If my argument is not convincing enough, then I think, it would be good idea to bring this bug to discussion on python-dev or web-sig and provide some concrete real world examples.  That could bring some use cases for/against this issue and might be helpful.

Thanks,
Senthil
msg154942 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-05 11:15
> "The action required MAY be carried out by the user agent without interaction with the user if and
> only if the method used in the second request is GET or HEAD".
> 
> By this, I understand, RFC means, for the POST data, the user is made aware and is conscious of the
> redirect which is happening and is "permitting" to POST the data to new location.

My reading too.

> The interaction happens like this:
> User: Post to /a
> Browser: Posts to /a and Server says oh /a is /b
> Browser: Hello user! Server says /a is now /b. Shall I post to /b?
> User: Yes, you may.
> 
> This is different from what you are saying, which is like with having an option in the browser settings
> which will enable following redirect on POST.

The big difference that I see is that urllib is a library, not a browser.  A browser could be implemented on top of urllib, and it is the browser’s responsibility to ask the user if they want POST data to be sent again.  The browser then needs to tell urllib to forward POST data: this is the piece that’s missing.

> In a similar way, developers using urllib as library in their applications can obtain the redirected URL
> and then POST to the redirected URL. That would be equivalent behavior.

Ah, so there is a way to achieve it!  If Jay is satisfied by this, we could add an example of using that in the urllib howto, with the appropriate warnings.  That would have much less risk than a new parameter.
msg154997 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-06 01:35
On Mon, Mar 05, 2012 at 11:15:38AM +0000, Éric Araujo wrote:
> Ah, so there is a way to achieve it!  If Jay is satisfied by this,
> we could add an example of using that in the urllib howto, with the
> appropriate warnings.  That would have much less risk than a new
> parameter.

Yeah. This could be cookbook recipe style example, if it's utility
value is high.
msg155168 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-03-08 17:44
I am closing this issue as I feel that the requirement may not be addressed by a change. If there is patch for HowTo, we can tackle that in another issue. Thank you for contribution.
msg156103 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-03-16 23:22
Continued in #14338.
History
Date User Action Args
2012-03-16 23:22:45eric.araujosetresolution: not a bug -> works for me
messages: + msg156103
stage: patch review -> resolved
2012-03-08 17:44:29orsenthilsetstatus: open -> closed
resolution: not a bug
messages: + msg155168
2012-03-06 01:35:06orsenthilsetmessages: + msg154997
2012-03-05 11:15:38eric.araujosetmessages: + msg154942
2012-03-05 10:10:54orsenthilsetmessages: + msg154940
2012-03-05 05:17:34eric.araujosetmessages: + msg154922
2012-03-04 23:40:58crustymonkeysetmessages: + msg154914
2012-03-03 11:45:48eric.araujosetmessages: + msg154831
2012-03-03 11:44:53eric.araujosettype: behavior -> enhancement
stage: patch review
messages: + msg154830
versions: + Python 3.3, - Python 2.7
2012-03-02 19:18:54crustymonkeysetfiles: + urllib2.py.redirect_option.patch

messages: + msg154788
2012-03-02 05:27:33eric.araujosetmessages: + msg154732
2012-03-02 05:15:46orsenthilsetmessages: + msg154730
2012-03-02 04:36:35eric.araujosetnosy: + eric.araujo

messages: + msg154726
title: urllib2 HTTPRedirectHandler not handling POST data in redirect -> urllib2 HTTPRedirectHandler not forwarding POST data in redirect
2012-02-27 23:07:49crustymonkeysetmessages: + msg154518
2012-02-27 22:47:32orsenthilsetnosy: + orsenthil
messages: + msg154517
2012-02-27 22:44:09crustymonkeycreate