Message 16380 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	sjones
Recipients
Date	2003-06-14.04:18:11
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=589306 Ok, I researched this a bit, and the situation isn't as simple as it first appears. The RFC that urlparse tries to follow is at http://www.faqs.org/rfcs/rfc1808.html and notice specifically sections 2.1 and 2.2. It seems to me that the source code follows rfc1808 religiously, and in that sense it does the correct thing. According to the RFC, the netloc should begin with a '//', and since your example didn't include one then it technical was an invalid URL. Here is another example where it seems Python fails to do the right thing: >>> urlparse.urlparse('python.org') ('', '', 'python.org', '', '', '') >>> urlparse.urlparse('python.org', 'http') ('http', '', 'python.org', '', '', '') Note that it is putting 'python.org' as the path and not the netloc. So the problem isn't limited to just when you use a scheme parameter and/or a port number. Now if we put '//' at the beginning, we get: >>> urlparse.urlparse('//python.org') ('', 'python.org', '', '', '', '') >>> urlparse.urlparse('//python.org', 'http') ('http', 'python.org', '', '', '', '') So here it does the correct thing. There are two problems though. First, it is common for browsers and other software to just take a URL without a scheme and '://' and assume it is http for the user. While the URL is technically not correct, it is still common usage. Also, urlparse does take a scheme parameter. It seems as though this parameter should be used with a scheme-less URL to give it a default one like web browsers do. So somebody needs to make a decision. Should urlparse follow the RFC religiously and require '//' in front of netlocs? If so, I think the documentation should give an example showing this and showing how to use the 'scheme' parameter. Or should urlparse use the more commonly used form of a URL where '//' is omitted when the scheme is omitted? If so, urlparse.py will need to be changed. Or maybe another fuction should be added to cover whichever behaviour urlparse doesn't cover. In any case, you can temporarily solve your problem by making sure that URL's without a scheme have '//' at the front. So your example becomes: >>> urlparse.urlparse('//1.2.3.4:80', 'http') ('http', '1.2.3.4:80', '', '', '', '')

Logged In: YES 
user_id=589306

Ok, I researched this a bit, and the situation isn't as
simple as it first appears. The RFC that urlparse tries to
follow is at http://www.faqs.org/rfcs/rfc1808.html and
notice specifically sections 2.1 and 2.2.

It seems to me that the source code follows rfc1808
religiously, and in that sense it does the correct thing.
According to the RFC, the netloc should begin with a '//',
and since your example didn't include one then it technical
was an invalid URL. Here is another example where it seems
Python fails to do the right thing:

>>> urlparse.urlparse('python.org')
('', '', 'python.org', '', '', '')
>>> urlparse.urlparse('python.org', 'http')
('http', '', 'python.org', '', '', '')

Note that it is putting 'python.org' as the path and not the
netloc. So the problem isn't limited to just when you use a
scheme parameter and/or a port number. Now if we put '//' at
the beginning, we get:

>>> urlparse.urlparse('//python.org')
('', 'python.org', '', '', '', '')
>>> urlparse.urlparse('//python.org', 'http')
('http', 'python.org', '', '', '', '')

So here it does the correct thing.

There are two problems though. First, it is common for
browsers and other software to just take a URL without a
scheme and '://' and assume it is http for the user. While
the URL is technically not correct, it is still common
usage. Also, urlparse does take a scheme parameter. It seems
as though this parameter should be used with a scheme-less
URL to give it a default one like web browsers do.

So somebody needs to make a decision. Should urlparse follow
the RFC religiously and require '//' in front of netlocs? If
so, I think the documentation should give an example showing
this and showing how to use the 'scheme' parameter. Or
should urlparse use the more commonly used form of a URL
where '//' is omitted when the scheme is omitted? If so,
urlparse.py will need to be changed. Or maybe another
fuction should be added to cover whichever behaviour
urlparse doesn't cover.

In any case, you can temporarily solve your problem by
making sure that URL's without a scheme have '//' at the
front. So your example becomes:

>>> urlparse.urlparse('//1.2.3.4:80', 'http')
('http', '1.2.3.4:80', '', '', '', '')

History
Date	User	Action	Args
2007-08-23 14:13:54	admin	link	issue754016 messages
2007-08-23 14:13:54	admin	create