Issue9374
Created on 2010-07-24 22:58 by Nick.Welch, last changed 2010-07-24 22:58 by Nick.Welch.
| Messages (1) | |||
|---|---|---|---|
| msg111511 - (view) | Author: Nick Welch (Nick.Welch) | Date: 2010-07-24 22:58 | |
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes. According to Wikipedia: ------------------ Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows: <scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ] ------------------ http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax Here is a demonstration of what urlparse currently does: >>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag') SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='') >>> urlparse.urlsplit('http://netloc/path?a=b#frag') SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag') |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-07-24 22:58:39 | Nick.Welch | create | |