Author georg.brandl
Recipients georg.brandl, orsenthil, sandro.tosi
Date 2013-01-11.17:54:18
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1357926859.42.0.981504662198.issue16932@psf.upfronthosting.co.za>
In-reply-to
Content
Hmm, you're right.  The behavior has been like this at least since Python 2.5:

Python 2.5.4 (r254:67916, Dec 16 2012, 20:33:12) 
[GCC 4.6.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> from urlparse import urlparse
>>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
('www.cwi.nl', '', '80/%7Eguido/Python.html', '', '', '')

The docs refer to RFC 1808.  From a quick glance at the BNF in section 2.2, RFC 1808 allows dots in the scheme, but also allows ":" in the path.  So there seems to be a parsing ambiguity, but see section 2.4.2:

   If the parse string contains a colon ":" after the first character
   and before any characters not allowed as part of a scheme name (i.e.,
   any not an alphanumeric, plus "+", period ".", or hyphen "-"), the
   <scheme> of the URL is the substring of characters up to but not
   including the first colon.  These characters and the colon are then
   removed from the parse string before continuing.

That would indicate that the implementation is correct and the documentation should be fixed. Senthil?
History
Date User Action Args
2013-01-11 17:54:19georg.brandlsetrecipients: + georg.brandl, orsenthil, sandro.tosi
2013-01-11 17:54:19georg.brandlsetmessageid: <1357926859.42.0.981504662198.issue16932@psf.upfronthosting.co.za>
2013-01-11 17:54:19georg.brandllinkissue16932 messages
2013-01-11 17:54:18georg.brandlcreate