This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Bug in python >= 2.7 with urllib2 fragment
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Ivan.Ivanenko, asvetlov, orsenthil, python-dev, santoso.wijaya
Priority: normal Keywords: patch

Created on 2011-03-28 21:23 by Ivan.Ivanenko, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue11703_py27.patch santoso.wijaya, 2011-03-29 19:28 Patch against 2.7 review
issue11703_py31.patch santoso.wijaya, 2011-03-29 19:29 Patch against 3.1
issue11703_py27_with_redirect.patch santoso.wijaya, 2011-04-06 23:54 Patch against 2.7 review
issue11703_py31_with_redirect.patch santoso.wijaya, 2011-04-06 23:55 Patch against 3.1 review
Messages (11)
msg132423 - (view) Author: Ivan Ivanenko (Ivan.Ivanenko) Date: 2011-03-28 21:23
result = urllib.urlopen("http://docs.python.org/library/urllib.html#OK")
print result.geturl()

result = urllib2.urlopen("http://docs.python.org/library/urllib.html#OK")
print result.geturl()

Python 2.6 returns:
"http://docs.python.org/library/urllib.html#OK"
"http://docs.python.org/library/urllib.html#OK"

Python 2.7 returns:
"http://docs.python.org/library/urllib.html#OK"
"http://docs.python.org/library/urllib.html"

2to3 -w test.py
Python 3 returns:
"http://docs.python.org/library/urllib.html"
"http://docs.python.org/library/urllib.html"

I expect geturl() result with "#OK" in all cases
msg132497 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-29 18:43
This is because the Request class' constructor splits the URL into __original and fragment:

    def __init__(self, url, data=None, headers={},
                 origin_req_host=None, unverifiable=False):
        # unwrap('<URL:type://host/path>') --> 'type://host/path'
        self.__original = unwrap(url)
        self.__original, fragment = splittag(self.__original)

And the construction of object that urlopen() returns has its geturl() returns the request object's __original field (by now, minus the fragment).
msg132508 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-29 19:28
Attaching patches against 2.7 and 3.1 branches.
msg132613 - (view) Author: Ivan Ivanenko (Ivan.Ivanenko) Date: 2011-03-30 20:57
Santa4nt, I think you also need to check case with Redirect Response URL:

print urllib2.urlopen("http://16.foobnix-cms.appspot.com/test_base").geturl()

python 2.6 returns OK
http://16.foobnix-cms.appspot.com/test_redirect#json={value:'OK'}

python 2.7 returns KO
http://16.foobnix-cms.appspot.com/test_redirect
msg133183 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-04-06 23:54
It already does. ;-)

Python 2.7.1+ (default, Apr  6 2011, 16:25:38) [MSC v.1500 32 bit (Intel)] on wi
n32
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2
[74578 refs]
>>> fp = urllib2.urlopen('http://16.foobnix-cms.appspot.com/test_base')
[75643 refs]
>>> fp.geturl()
"http://16.foobnix-cms.appspot.com/test_redirect#json={value:'OK'}"
[75645 refs]

I'm attaching patches with the appropriate unittest for the redirected case, though.
msg133620 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-12 23:26
New changeset 3f240a1cd245 by Senthil Kumaran in branch '3.1':
Fix Issue11703 - urllib2.geturl() does not return correct url when the original url contains #fragment. Patch Contribution by Santoso Wijaya.
http://hg.python.org/cpython/rev/3f240a1cd245
msg133621 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-12 23:27
It should be noted that the bug surfaced in 2.7 and above due to changes made as part of Issue8280.
msg133624 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-12 23:36
New changeset 6e73f75ee034 by Senthil Kumaran in branch '2.7':
Fix Issue11703 - urllib2.get_url does not handle fragment in url properly.
http://hg.python.org/cpython/rev/6e73f75ee034
msg133625 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-12 23:38
This is fixed in all the codelines. Thanks for the patch, Santoso.
msg133634 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-13 01:43
New changeset 8ee48ec69844 by Senthil Kumaran in branch '3.1':
Update the News for the fix to Issue11703.
http://hg.python.org/cpython/rev/8ee48ec69844
msg133635 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-13 01:47
New changeset 502bb809b03b by Senthil Kumaran in branch '2.7':
update news in 2.7  for Issue #11703
http://hg.python.org/cpython/rev/502bb809b03b
History
Date User Action Args
2022-04-11 14:57:15adminsetgithub: 55912
2011-04-13 01:47:53python-devsetmessages: + msg133635
2011-04-13 01:43:31python-devsetmessages: + msg133634
2011-04-12 23:38:52orsenthilsetstatus: open -> closed

messages: + msg133625
2011-04-12 23:36:24python-devsetmessages: + msg133624
2011-04-12 23:27:58orsenthilsetassignee: orsenthil
resolution: fixed
messages: + msg133621
2011-04-12 23:26:13python-devsetnosy: + python-dev
messages: + msg133620
2011-04-06 23:55:06santoso.wijayasetfiles: + issue11703_py31_with_redirect.patch
2011-04-06 23:54:47santoso.wijayasetfiles: + issue11703_py27_with_redirect.patch

messages: + msg133183
2011-04-05 09:08:29asvetlovsetnosy: + asvetlov
2011-03-30 20:57:22Ivan.Ivanenkosetmessages: + msg132613
2011-03-29 20:16:21ned.deilysetnosy: + orsenthil
2011-03-29 19:29:00santoso.wijayasetfiles: + issue11703_py31.patch
2011-03-29 19:28:44santoso.wijayasetfiles: + issue11703_py27.patch
keywords: + patch
messages: + msg132508
2011-03-29 18:43:49santoso.wijayasetmessages: + msg132497
2011-03-29 18:28:04santoso.wijayasetnosy: + santoso.wijaya

versions: + Python 3.3
2011-03-28 21:23:30Ivan.Ivanenkocreate