Message 119991 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	belopolsky, docs@python, eric.araujo, georg.brandl, orsenthil, r.david.murray
Date	2010-10-30.14:51:17
SpamBayes Score	3.655777e-08
Marked as misclassified	No
Message-id	<1288450282.22.0.736195691365.issue10226@psf.upfronthosting.co.za>
In-reply-to

Content
How about this: - If the scheme value is not specified, urlparse following the syntax - specifications from RFC 1808, expects the netloc value to start with '//', - Otherwise, it is not possible to distinguish between net_loc and path - component and would classify the indistinguishable component as path as in - a relative url. + Following the syntax specifications in RFC 1808, urlparse recognizes + a netloc only if it is properly introduced by '//'. Otherwise the + input must be presumed to be a relative URL and thus to start with + a path component. However, it seems to me there is a bug here: >>> urlparse.urlparse('www.k.com:80/path') ParseResult(scheme='', netloc='', path='www.k.com:80/path', params='', query='', fragment='') >>> urlparse.urlparse('www.k.com:path') ParseResult(scheme='www.k.com', netloc='', path='path', params='', query='', fragment='') I think the second one is correct and that the first one should produce ParseResult(scheme='www.k.com', netloc='', path='80/path', params='', query='', fragment='') I haven't read all the way through the RFC again, though. But one of the above is wrong.

How about this:

-  If the scheme value is not specified, urlparse following the syntax
-  specifications from RFC 1808, expects the netloc value to start with '//',
-  Otherwise, it is not possible to distinguish between net_loc and path
-  component and would classify the indistinguishable component as path as in
-  a relative url.

+  Following the syntax specifications in RFC 1808, urlparse recognizes
+  a netloc only if it is properly introduced by '//'.  Otherwise the
+  input must be presumed to be a relative URL and thus to start with
+  a path component.


However, it seems to me there is a bug here:

>>> urlparse.urlparse('www.k.com:80/path')
ParseResult(scheme='', netloc='', path='www.k.com:80/path', params='',
query='', fragment='')
>>> urlparse.urlparse('www.k.com:path')
ParseResult(scheme='www.k.com', netloc='', path='path', params='',
query='', fragment='')

I think the second one is correct and that the first one should produce

ParseResult(scheme='www.k.com', netloc='', path='80/path', params='',
query='', fragment='')

I haven't read all the way through the RFC again, though.  But *one*
of the above is wrong.

History
Date	User	Action	Args
2010-10-30 14:51:22	r.david.murray	set	recipients: + r.david.murray, georg.brandl, belopolsky, orsenthil, eric.araujo, docs@python
2010-10-30 14:51:22	r.david.murray	set	messageid: <1288450282.22.0.736195691365.issue10226@psf.upfronthosting.co.za>
2010-10-30 14:51:17	r.david.murray	link	issue10226 messages
2010-10-30 14:51:17	r.david.murray	create