classification
Title: Redirect is not working correctly in urllib2
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, facundobatista, gregory.p.smith, janik, karlcow, martin.panter, orsenthil, python-dev
Priority: normal Keywords: patch

Created on 2012-02-26 15:15 by janik, last changed 2016-05-16 09:45 by martin.panter. This issue is now closed.

Files
File name Uploaded Description Edit
urllib2_redirect_fix.patch janik, 2012-02-26 15:15 The possible bug fix
urllib2_redirect_fix.2.patch martin.panter, 2015-05-26 03:55 review
Messages (7)
msg154356 - (view) Author: Ján Janech (janik) Date: 2012-02-26 15:15
When only the query string is sent by the server as the redirect url, urllib2 redirects to incorrect address.

Error is occuring on the page http://kniznica.uniza.sk/opac. Server sends only the query string part of the uri in the Location header (ie. ?fs=04D07295D4434730A51C95A9F1727373&fn=main). Path is then incorrectly stripped from the original url, and urllib2 redirects to http://kniznica.uniza.sk/?fs=04D07295D4434730A51C95A9F1727373&fn=main.

The error was introduced in the fix of the issue #2464. I think, the attached patch is fixing the error (it is working for me).
msg154357 - (view) Author: Ján Janech (janik) Date: 2012-02-26 15:16
I forgot to mention that the correct url in the example would be http://kniznica.uniza.sk/opac?fs=04D07295D4434730A51C95A9F1727373&fn=main.
msg183577 - (view) Author: karl (karlcow) * Date: 2013-03-06 03:39
→  curl -sI  http://kniznica.uniza.sk/opac

HTTP/1.1 302 Moved Temporarily
Date: Wed, 06 Mar 2013 03:23:06 GMT
Server: Indy/9.0.50
Content-Type: text/html
Location: ?fs=C79F09C9F1304E7AA4FF7C211BEA2B9B&fn=main


→ python3.3

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.parse
>>> urllib.parse.urlparse("http://kniznica.uniza.sk/opac")
ParseResult(scheme='http', netloc='kniznica.uniza.sk', path='/opac', params='', query='', fragment='')
>>> urllib.parse.urlparse("?fs=C79F09C9F1304E7AA4FF7C211BEA2B9B&fn=main")
ParseResult(scheme='', netloc='', path='', params='', query='fs=C79F09C9F1304E7AA4FF7C211BEA2B9B&fn=main', fragment='')

Redirection is defined at
http://hg.python.org/cpython/file/5e294202f93e/Lib/urllib/request.py#l643
msg243875 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-23 00:58
The proposed patch looks good to me. A test case would be nice though.

Also I wonder why the “malformed URL” logic needs to be in urllib.request. Surely it either belongs in urljoin(), or in the underlying http.client. That needs more thought, but either way the current patch is a definite improvement.
msg244080 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-26 03:55
urllib2_redirect_fix.2.patch adds a test.

I was tempted to remove the whole block of code setting the path to “/”, but there is one minor disadvantage: if a redirect points to a so-called “malformed” URL without any path component, like “http://example.net” or “http://example.net?query”, geturl() would return this URL verbatim.
msg265514 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-05-14 10:38
I will try to commit this soon
msg265683 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-05-16 08:15
New changeset 52a7f580580c by Martin Panter in branch '3.5':
Issue #14132: Fix redirect handling when target is just a query string
https://hg.python.org/cpython/rev/52a7f580580c

New changeset 789a3f87bde1 by Martin Panter in branch '2.7':
Issue #14132: Fix redirect handling when target is just a query string
https://hg.python.org/cpython/rev/789a3f87bde1

New changeset 841a9a3f3cf6 by Martin Panter in branch 'default':
Issue #14132, Issue #17214: Merge two redirect handling fixes from 3.5
https://hg.python.org/cpython/rev/841a9a3f3cf6
History
Date User Action Args
2016-05-16 09:45:40martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-05-16 08:15:10python-devsetnosy: + python-dev
messages: + msg265683
2016-05-14 10:38:39martin.pantersetstage: patch review -> commit review
messages: + msg265514
versions: - Python 3.4
2015-05-26 03:55:01martin.pantersetfiles: + urllib2_redirect_fix.2.patch

stage: test needed -> patch review
messages: + msg244080
versions: + Python 3.6
2015-05-23 00:58:52martin.pantersetversions: + Python 3.4, Python 3.5
nosy: + martin.panter

messages: + msg243875

stage: test needed
2013-03-06 03:39:14karlcowsetnosy: + karlcow
messages: + msg183577
2012-02-26 15:18:27ezio.melottisetnosy: + facundobatista, gregory.p.smith, orsenthil, ezio.melotti
2012-02-26 15:16:37janiksetmessages: + msg154357
2012-02-26 15:15:31janikcreate