Message179712
Hmm, you're right. The behavior has been like this at least since Python 2.5:
Python 2.5.4 (r254:67916, Dec 16 2012, 20:33:12)
[GCC 4.6.3] on linux3
Type "help", "copyright", "credits" or "license" for more information.
>>> from urlparse import urlparse
>>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
('www.cwi.nl', '', '80/%7Eguido/Python.html', '', '', '')
The docs refer to RFC 1808. From a quick glance at the BNF in section 2.2, RFC 1808 allows dots in the scheme, but also allows ":" in the path. So there seems to be a parsing ambiguity, but see section 2.4.2:
If the parse string contains a colon ":" after the first character
and before any characters not allowed as part of a scheme name (i.e.,
any not an alphanumeric, plus "+", period ".", or hyphen "-"), the
<scheme> of the URL is the substring of characters up to but not
including the first colon. These characters and the colon are then
removed from the parse string before continuing.
That would indicate that the implementation is correct and the documentation should be fixed. Senthil? |
|
Date |
User |
Action |
Args |
2013-01-11 17:54:19 | georg.brandl | set | recipients:
+ georg.brandl, orsenthil, sandro.tosi |
2013-01-11 17:54:19 | georg.brandl | set | messageid: <1357926859.42.0.981504662198.issue16932@psf.upfronthosting.co.za> |
2013-01-11 17:54:19 | georg.brandl | link | issue16932 messages |
2013-01-11 17:54:18 | georg.brandl | create | |
|