classification
Title: urlparse.urlsplit() regression for paths consisting of digits
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: calvin, orsenthil, python-dev, r.david.murray, santoso.wijaya
Priority: normal Keywords: patch

Created on 2011-03-11 13:47 by calvin, last changed 2011-04-15 10:23 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
urlparse.patch santoso.wijaya, 2011-03-14 08:25 review
Messages (12)
msg130570 - (view) Author: Bastian Kleineidam (calvin) Date: 2011-03-11 13:47
When using a javascript URL with only digits as paths, the urlsplit() functions behaves different in Python 2.7 than in 2.6:

$ python2.6 -c "import urlparse; print urlparse.urlsplit('javascript:123')"
SplitResult(scheme='javascript', netloc='', path='123', query='', fragment='')

$ python2.7 -c "import urlparse; print urlparse.urlsplit('javascript:123')"
SplitResult(scheme='', netloc='', path='javascript:123', query='', fragment='')

Python 3.2 has the same regression:
$ python3.2 -c "import urllib.parse; print(urllib.parse.urlsplit('javascript:123'))"
SplitResult(scheme='', netloc='', path='javascript:123', query='', fragment='')

I consider the Python 2.6 behaviour to be correct, ie. the current behaviour is buggy.
msg130575 - (view) Author: Bastian Kleineidam (calvin) Date: 2011-03-11 14:09
The behaviour change is caused by the fix for issue #754016.
msg130594 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-11 17:29
What kind of url is 'javascript:123' and how do you (/ we) say that python2.6 behavior was correct?
msg130599 - (view) Author: Bastian Kleineidam (calvin) Date: 2011-03-11 18:41
Regarding the correctness of the Python 2.6 implementation: http://www.faqs.org/rfcs/rfc1738.html specifies URLs of the form <scheme>:<scheme-specific-part> where the scheme specific part is allowed to consist only of digits.

I agree that the example URL is not a good one and it is artificially constructed.

Some better examples demonstrating the same issue might be
clsid:85bbd92o-42a0-1o69-a2e4-08002b30309d
or
mailto:1337@example.org
msg130663 - (view) Author: Bastian Kleineidam (calvin) Date: 2011-03-12 06:33
To make the previous comment more precise: URLs where
the scheme specific part begins with a digit are affected.
msg130795 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-14 08:23
I'm attaching a patch with a fix and a unittest using the email example. I put this in a new test_RFC2368 (the mailto URL scheme) method. Seems like there is no unittest for parsing mailto scheme to begin with.
msg130796 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-14 08:25
Oops, wrong revision base.
msg130797 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-03-14 08:30
Santoso, Quick review comments:

1. The patch looks good.
2. I would use a temporary 'throw-away' variable instead of _, but don't bother to change it, before committing I shall take care.
3. Important - Did you find any regression with the earlier builds, also could you run the full test suite to ensure that all tests pass?
msg130799 - (view) Author: Santoso Wijaya (santoso.wijaya) * Date: 2011-03-14 09:00
Senthil,

Thanks for the review! I was initially thinking of `port = ...` but opted for _, arbitrarily, instead.

regrtest on Darwin-10.6.0-i386-64bit ran fine.
msg133801 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-15 10:08
New changeset 7a693e283c68 by Senthil Kumaran in branch '2.7':
Issue #11467: Fix urlparse behavior when handling urls which contains scheme
http://hg.python.org/cpython/rev/7a693e283c68
msg133802 - (view) Author: Roundup Robot (python-dev) Date: 2011-04-15 10:22
New changeset 495d12196487 by Senthil Kumaran in branch '3.1':
Issue #11467: Fix urlparse behavior when handling urls which contains scheme specific part only digits.
http://hg.python.org/cpython/rev/495d12196487
msg133803 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-04-15 10:23
Fixed this in all codelines. Thanks Santoso.
History
Date User Action Args
2011-04-15 10:23:51orsenthilsetstatus: open -> closed
resolution: fixed
messages: + msg133803

stage: resolved
2011-04-15 10:22:35python-devsetmessages: + msg133802
2011-04-15 10:08:33python-devsetnosy: + python-dev
messages: + msg133801
2011-04-14 16:49:44santoso.wijayasetversions: + Python 3.1
2011-03-14 09:00:51santoso.wijayasetnosy: calvin, orsenthil, r.david.murray, santoso.wijaya
messages: + msg130799
2011-03-14 08:30:07orsenthilsetnosy: calvin, orsenthil, r.david.murray, santoso.wijaya
messages: + msg130797
2011-03-14 08:25:32santoso.wijayasetfiles: + urlparse.patch
nosy: calvin, orsenthil, r.david.murray, santoso.wijaya
messages: + msg130796
2011-03-14 08:25:04santoso.wijayasetfiles: - urlparse.patch
nosy: calvin, orsenthil, r.david.murray, santoso.wijaya
2011-03-14 08:23:12santoso.wijayasetfiles: + urlparse.patch
nosy: calvin, orsenthil, r.david.murray, santoso.wijaya
messages: + msg130795
2011-03-13 21:29:10santoso.wijayasetnosy: + santoso.wijaya

versions: + Python 3.3
2011-03-12 06:33:34calvinsetnosy: calvin, orsenthil, r.david.murray
messages: + msg130663
2011-03-11 18:41:29calvinsetnosy: calvin, orsenthil, r.david.murray
messages: + msg130599
2011-03-11 17:29:46orsenthilsetassignee: orsenthil
messages: + msg130594
keywords: + patch
nosy: calvin, orsenthil, r.david.murray
2011-03-11 17:22:08r.david.murraysetnosy: + r.david.murray, orsenthil
2011-03-11 14:09:02calvinsetmessages: + msg130575
2011-03-11 13:47:52calvinsetversions: + Python 3.2
2011-03-11 13:47:41calvincreate