Message 111511 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Nick.Welch
Recipients	Nick.Welch
Date	2010-07-24.22:58:38
SpamBayes Score	0.040680405
Marked as misclassified	No
Message-id	<1280012321.31.0.791086727515.issue9374@psf.upfronthosting.co.za>
In-reply-to

Content
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes. According to Wikipedia: ------------------ Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows: <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ] ------------------ http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax Here is a demonstration of what urlparse currently does: >>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag') SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='') >>> urlparse.urlsplit('http://netloc/path?a=b#frag') SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')

While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax


Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')

History
Date	User	Action	Args
2010-07-24 22:58:41	Nick.Welch	set	recipients: + Nick.Welch
2010-07-24 22:58:41	Nick.Welch	set	messageid: <1280012321.31.0.791086727515.issue9374@psf.upfronthosting.co.za>
2010-07-24 22:58:39	Nick.Welch	link	issue9374 messages
2010-07-24 22:58:38	Nick.Welch	create