Message111511
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.
According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax
Here is a demonstration of what urlparse currently does:
>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')
>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag') |
|
Date |
User |
Action |
Args |
2010-07-24 22:58:41 | Nick.Welch | set | recipients:
+ Nick.Welch |
2010-07-24 22:58:41 | Nick.Welch | set | messageid: <1280012321.31.0.791086727515.issue9374@psf.upfronthosting.co.za> |
2010-07-24 22:58:39 | Nick.Welch | link | issue9374 messages |
2010-07-24 22:58:38 | Nick.Welch | create | |
|