This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib2.HTTPRedirectHandler incorrect redirect
Type: Stage:
Components: Library (Lib) Versions: Python 2.7, Python 2.6, Python 2.5
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: andresriancho, l0nwlf, orsenthil
Priority: normal Keywords:

Created on 2010-02-12 21:44 by andresriancho, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (4)
msg99292 - (view) Author: Andres Riancho (andresriancho) Date: 2010-02-12 21:44
Buggy code:

"""
        if 'location' in headers:
            newurl = headers.getheaders('location')[0]
        elif 'uri' in headers:
            newurl = headers.getheaders('uri')[0]
        else:
            return
        newurl = urlparse.urljoin(req.get_full_url(), newurl)
"""        

You might end up being redirected to some "strange" location if for some reason the value of "location" is C:\boot.ini, and you urlparse.urljoin the current URL with that one, you end up with C:\boot.ini . When the urllib2 library opens that, it will open a local file. What I did to fix it, is to verify that the protocol of the newurl is http or https.

"""
        correct_protocol = newurl.startswith('http://')  or newurl.startswith('https://') 
        if not correct_protocol:
            return

"""

The fix should be applied just below the dangerous urlparse.urljoin.
msg99294 - (view) Author: Shashwat Anand (l0nwlf) Date: 2010-02-12 23:02
In Python 2.7a3+, trunk:78165 the code is as follows: (note that there is a segment of code to fix malformed URLs)

"""
        if 'location' in headers:
            newurl = headers.getheaders('location')[0]
        elif 'uri' in headers:
            newurl = headers.getheaders('uri')[0]
        else:
            return

        # fix a possible malformed URL
        urlparts = urlparse.urlparse(newurl)
        if not urlparts.path:
            urlparts = list(urlparts)
            urlparts[2] = "/"
        newurl = urlparse.urlunparse(urlparts)

        newurl = urlparse.urljoin(req.get_full_url(), newurl)

"""
msg99318 - (view) Author: Andres Riancho (andresriancho) Date: 2010-02-13 14:43
The problem is still there in 2.7:

>>> urlparts = urlparse.urlparse('C:\\boot.ini')
>>> urlparts
('c', '', '\\boot.ini', '', '', '')
>>> if not urlparts.path:
...     urlparts = list(urlparts)
...     urlparts[2] = "/"
... 
>>> urlparts
('c', '', '\\boot.ini', '', '', '')
>>> newurl = urlparse.urlunparse(urlparts)
>>> newurl
'c:\\boot.ini'
>>> newurl = urlparse.urljoin( 'http://host.tld/spam/eggs.py', newurl)
>>> newurl
'c:\\boot.ini'
>>>
msg99818 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-02-22 19:48
I disagree with this bug report.

First the responsibility of checking if something is not  malicious URL lies at the application/client end, urllib in its redirection never urlopen()'s the redirected url, so you are not harmed in anyway.
If you wanted to be extra careful of the BAD Server, you can always subclass the redirection methods and use those handlers.

Closing this bug as won't fix.
History
Date User Action Args
2022-04-11 14:56:57adminsetgithub: 52168
2010-02-22 19:48:50orsenthilsetstatus: open -> closed
assignee: orsenthil
resolution: wont fix
messages: + msg99818
2010-02-13 14:43:12andresrianchosetmessages: + msg99318
versions: + Python 2.7
2010-02-13 04:33:37l0nwlfsetnosy: + orsenthil
2010-02-12 23:02:42l0nwlfsetnosy: + l0nwlf
messages: + msg99294
2010-02-12 21:44:43andresrianchocreate