Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlparse.urlsplit regression in 2.7 #52967

Closed
srid mannequin opened this issue May 15, 2010 · 11 comments
Closed

urlparse.urlsplit regression in 2.7 #52967

srid mannequin opened this issue May 15, 2010 · 11 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@srid
Copy link
Mannequin

srid mannequin commented May 15, 2010

BPO 8721
Nosy @orsenthil, @tarekziade, @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-05-15.03:14:59.866>
created_at = <Date 2010-05-15.02:28:51.564>
labels = ['invalid', 'type-bug', 'library']
title = 'urlparse.urlsplit regression in 2.7'
updated_at = <Date 2010-05-19.17:03:46.379>
user = 'https://bugs.python.org/srid'

bugs.python.org fields:

activity = <Date 2010-05-19.17:03:46.379>
actor = 'r.david.murray'
assignee = 'none'
closed = True
closed_date = <Date 2010-05-15.03:14:59.866>
closer = 'r.david.murray'
components = ['Library (Lib)']
creation = <Date 2010-05-15.02:28:51.564>
creator = 'srid'
dependencies = []
files = []
hgrepos = []
issue_num = 8721
keywords = []
message_count = 11.0
messages = ['105785', '105786', '105787', '105788', '105789', '105850', '106040', '106041', '106049', '106075', '106083']
nosy_count = 4.0
nosy_names = ['orsenthil', 'tarek', 'r.david.murray', 'srid']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8721'
versions = ['Python 2.7']

@srid
Copy link
Mannequin Author

srid mannequin commented May 15, 2010

[storage@nas0 ~]$ python2.6 -c "import urlparse; print urlparse.urlsplit('http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True)"
SplitResult(scheme='http', netloc='www.famfamfam.com](http:', path='//www.famfamfam.com/', query='', fragment='')
[storage@nas0 ~]$ python2.7 -c "import urlparse; print urlparse.urlsplit('http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True)"
('urlsplit() - %s, scheme=%s, allow_fragments=%s', 'http://www.famfamfam.com](http://www.famfamfam.com/', 'http', True)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/apy/APy27/lib/python2.7/urlparse.py", line 184, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
[storage@nas0 ~]$

via http://bitbucket.org/tarek/distribute/issue/160

@srid srid mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels May 15, 2010
@bitdancer
Copy link
Member

Why do you think this is a regression? It looks to me like the error message is accurate. (A ']' is valid in the netloc part only when specifying an IPv6 address).

@srid
Copy link
Mannequin Author

srid mannequin commented May 15, 2010

Shouldn't urlparse accept non-IPv6 URLs as well - as it always used to - when these URLs can have a single ']'?

@srid
Copy link
Mannequin Author

srid mannequin commented May 15, 2010

For eg., the following URLs seems to load just fine in my browser:

http://www.google.com/search?q=foo&b=df]d&qscrl=1

And, as is the case with the django-cms PyPI page (see referred issue link in msg), such URLs seemed to be practically used in a few places.

@bitdancer
Copy link
Member

Python 2.7b2+ (trunk:81129, May 12 2010, 19:05:17) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from urlparse import urlsplit
>>> urlsplit('http://www.google.com/search?q=foo&b=df]d&qscrl=1', 'http', True)
SplitResult(scheme='http', netloc='www.google.com', path='/search', query='q=foo&b=df]d&qscrl=1', fragment='')

@orsenthil
Copy link
Member

FWIW, it should also be noted that RFC asserts square brackets to be
valid characters in the hostname portion only and that too when it is
a IPv6 url.

In the example given, at the query portion, it should be quoted (or
percent-encoded)

@tarekziade
Copy link
Mannequin

tarekziade mannequin commented May 19, 2010

I couldn't find the relevant commits, but if we didn't do it, ISTM that we should backport the fix in the next 2.6 so it behaves like in 2.7.

@orsenthil
Copy link
Member

tarek: bpo-2987 has the details on changes made for ipv6 urlparse.
Those can't be backported as it's a feature. I would rather like to see whats breaking in distutils2.

The url which resulted in this bug in distribute: "http://www.famfamfam.com](http://www.famfamfam.com/" is clearly an invalid one. What can be done is py26, looking for invalid char like '[' or ']' outside of netloc.

@tarekziade
Copy link
Mannequin

tarekziade mannequin commented May 19, 2010

Senthil: thx for the pointer.

I've fixed the problem on distribute side by catching any ValueError returned by urlparse (from 2.6 or 2.7 point of view).

That said, I don't think than catching more invalid URLs in Python 2.7 should be considered as a feature.

If it's a new feature then we should have an option to explicitly parse IpV6-like URLs and leave the default behavior like it was in 2.6. If not, then it should be considered as a bug fix (meaning that Python now discards more malformed URLs) and should be backported imo.

IOW, I want to discard invalid URLs the same way no matter what the Python version is, because this is not a rule defined by Python, rather by some RFCs at the URL level.

@srid
Copy link
Mannequin Author

srid mannequin commented May 19, 2010

On 2010-05-19, at 5:00 AM, Tarek Ziadé wrote:

I've fixed the problem on distribute side by catching any ValueError returned by urlparse (from 2.6 or 2.7 point of view).

Catching ValueError will catch *every* ValueError raised, rather than only the intended one: ValueError("Invalid IPv6 URL"). Can we have a custom exception for this? Generally, I am curious as to what the convention is in regards to raising standard vs custom exceptions from the standard library.

@bitdancer
Copy link
Member

Why would you not want to catch all value errors? I assume (perhaps a bad thing) that distribute will repeat the returned error message in a more user friendly format. If a bug in urlparse returns a spurious ValueError, that will presumably be found (and then corrected) either by the test suite or by other code in addition to distribute.

The standard library should use standard exceptions unless there is a compelling reason to create a new exception. This rule of thumb has not always been followed, of course.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants