This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Nick.Welch
Recipients Nick.Welch
Date 2010-07-24.22:58:38
SpamBayes Score 0.040680405
Marked as misclassified No
Message-id <1280012321.31.0.791086727515.issue9374@psf.upfronthosting.co.za>
In-reply-to
Content
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax


Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')
History
Date User Action Args
2010-07-24 22:58:41Nick.Welchsetrecipients: + Nick.Welch
2010-07-24 22:58:41Nick.Welchsetmessageid: <1280012321.31.0.791086727515.issue9374@psf.upfronthosting.co.za>
2010-07-24 22:58:39Nick.Welchlinkissue9374 messages
2010-07-24 22:58:38Nick.Welchcreate