Issue 8721: urlparse.urlsplit regression in 2.7

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/52967

classification

Title:	urlparse.urlsplit regression in 2.7
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	orsenthil, r.david.murray, srid, tarek
Priority:	normal	Keywords:

Created on 2010-05-15 02:28 by srid, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (11)
msg105785 - (view)	Author: Sridhar Ratnakumar (srid)	Date: 2010-05-15 02:28
[storage@nas0 ~]$ python2.6 -c "import urlparse; print urlparse.urlsplit('http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True)" SplitResult(scheme='http', netloc='www.famfamfam.com](http:', path='//www.famfamfam.com/', query='', fragment='') [storage@nas0 ~]$ python2.7 -c "import urlparse; print urlparse.urlsplit('http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True)" ('urlsplit() - %s, scheme=%s, allow_fragments=%s', 'http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True) Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/apy/APy27/lib/python2.7/urlparse.py", line 184, in urlsplit raise ValueError("Invalid IPv6 URL") ValueError: Invalid IPv6 URL [storage@nas0 ~]$ via http://bitbucket.org/tarek/distribute/issue/160
msg105786 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-05-15 02:40
Why do you think this is a regression? It looks to me like the error message is accurate. (A ']' is valid in the netloc part only when specifying an IPv6 address).
msg105787 - (view)	Author: Sridhar Ratnakumar (srid)	Date: 2010-05-15 02:56
Shouldn't `urlparse` accept non-IPv6 URLs as well - as it always used to - when these URLs can have a single ']'?
msg105788 - (view)	Author: Sridhar Ratnakumar (srid)	Date: 2010-05-15 02:58
For eg., the following URLs seems to load just fine in my browser: http://www.google.com/search?q=foo&b=df]d&qscrl=1 And, as is the case with the django-cms PyPI page (see referred issue link in msg), such URLs seemed to be practically used in a few places.
msg105789 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-05-15 03:14
Python 2.7b2+ (trunk:81129, May 12 2010, 19:05:17) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from urlparse import urlsplit >>> urlsplit('http://www.google.com/search?q=foo&b=df]d&qscrl=1', 'http', True) SplitResult(scheme='http', netloc='www.google.com', path='/search', query='q=foo&b=df]d&qscrl=1', fragment='')
msg105850 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2010-05-16 07:03
FWIW, it should also be noted that RFC asserts square brackets to be valid characters in the hostname portion only and that too when it is a IPv6 url. In the example given, at the query portion, it should be quoted (or percent-encoded)
msg106040 - (view)	Author: Tarek Ziadé (tarek) *	Date: 2010-05-19 10:01
I couldn't find the relevant commits, but if we didn't do it, ISTM that we should backport the fix in the next 2.6 so it behaves like in 2.7.
msg106041 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2010-05-19 10:13
tarek: Issue2987 has the details on changes made for ipv6 urlparse. Those can't be backported as it's a feature. I would rather like to see whats breaking in distutils2. The url which resulted in this bug in distribute: "http://www.famfamfam.com](http://www.famfamfam.com/" is clearly an invalid one. What can be done is py26, looking for invalid char like '[' or ']' outside of netloc.
msg106049 - (view)	Author: Tarek Ziadé (tarek) *	Date: 2010-05-19 12:00
Senthil: thx for the pointer. I've fixed the problem on distribute side by catching any ValueError returned by urlparse (from 2.6 or 2.7 point of view). That said, I don't think than catching more invalid URLs in Python 2.7 should be considered as a feature. If it's a new feature then we should have an option to explicitly parse IpV6-like URLs and leave the default behavior like it was in 2.6. If not, then it should be considered as a bug fix (meaning that Python now discards more malformed URLs) and should be backported imo. IOW, I want to discard invalid URLs the same way no matter what the Python version is, because this is not a rule defined by Python, rather by some RFCs at the URL level.
msg106075 - (view)	Author: Sridhar Ratnakumar (srid)	Date: 2010-05-19 16:08
On 2010-05-19, at 5:00 AM, Tarek Ziadé wrote: > I've fixed the problem on distribute side by catching any ValueError returned by urlparse (from 2.6 or 2.7 point of view). Catching ValueError will catch every ValueError raised, rather than only the intended one: ValueError("Invalid IPv6 URL"). Can we have a custom exception for this? Generally, I am curious as to what the convention is in regards to raising standard vs custom exceptions from the standard library.
msg106083 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-05-19 17:03
Why would you not want to catch all value errors? I assume (perhaps a bad thing) that distribute will repeat the returned error message in a more user friendly format. If a bug in urlparse returns a spurious ValueError, that will presumably be found (and then corrected) either by the test suite or by other code in addition to distribute. The standard library should use standard exceptions unless there is a compelling reason to create a new exception. This rule of thumb has not always been followed, of course.

History
Date	User	Action	Args
2022-04-11 14:57:01	admin	set	github: 52967
2010-05-19 17:03:46	r.david.murray	set	messages: + msg106083
2010-05-19 16:08:29	srid	set	messages: + msg106075
2010-05-19 12:00:08	tarek	set	messages: + msg106049
2010-05-19 10:13:08	orsenthil	set	messages: + msg106041
2010-05-19 10:01:28	tarek	set	nosy: + tarek messages: + msg106040
2010-05-16 07:03:04	orsenthil	set	messages: + msg105850
2010-05-15 03:14:59	r.david.murray	set	status: open -> closed messages: + msg105789 stage: resolved
2010-05-15 02:58:08	srid	set	messages: + msg105788
2010-05-15 02:56:12	srid	set	status: pending -> open messages: + msg105787
2010-05-15 02:40:31	r.david.murray	set	status: open -> pending nosy: + r.david.murray, orsenthil messages: + msg105786 resolution: not a bug
2010-05-15 02:28:51	srid	create