This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author sjones
Recipients
Date 2003-06-14.04:18:11
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=589306

Ok, I researched this a bit, and the situation isn't as
simple as it first appears. The RFC that urlparse tries to
follow is at http://www.faqs.org/rfcs/rfc1808.html and
notice specifically sections 2.1 and 2.2.

It seems to me that the source code follows rfc1808
religiously, and in that sense it does the correct thing.
According to the RFC, the netloc should begin with a '//',
and since your example didn't include one then it technical
was an invalid URL. Here is another example where it seems
Python fails to do the right thing:

>>> urlparse.urlparse('python.org')
('', '', 'python.org', '', '', '')
>>> urlparse.urlparse('python.org', 'http')
('http', '', 'python.org', '', '', '')

Note that it is putting 'python.org' as the path and not the
netloc. So the problem isn't limited to just when you use a
scheme parameter and/or a port number. Now if we put '//' at
the beginning, we get:

>>> urlparse.urlparse('//python.org')
('', 'python.org', '', '', '', '')
>>> urlparse.urlparse('//python.org', 'http')
('http', 'python.org', '', '', '', '')

So here it does the correct thing.

There are two problems though. First, it is common for
browsers and other software to just take a URL without a
scheme and '://' and assume it is http for the user. While
the URL is technically not correct, it is still common
usage. Also, urlparse does take a scheme parameter. It seems
as though this parameter should be used with a scheme-less
URL to give it a default one like web browsers do.

So somebody needs to make a decision. Should urlparse follow
the RFC religiously and require '//' in front of netlocs? If
so, I think the documentation should give an example showing
this and showing how to use the 'scheme' parameter. Or
should urlparse use the more commonly used form of a URL
where '//' is omitted when the scheme is omitted? If so,
urlparse.py will need to be changed. Or maybe another
fuction should be added to cover whichever behaviour
urlparse doesn't cover.

In any case, you can temporarily solve your problem by
making sure that URL's without a scheme have '//' at the
front. So your example becomes:

>>> urlparse.urlparse('//1.2.3.4:80', 'http')
('http', '1.2.3.4:80', '', '', '', '')

History
Date User Action Args
2007-08-23 14:13:54adminlinkissue754016 messages
2007-08-23 14:13:54admincreate