classification
Title: urlparse should parse query and fragment for arbitrary schemes
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Nick.Welch
Priority: normal Keywords:

Created on 2010-07-24 22:58 by Nick.Welch, last changed 2010-07-24 22:58 by Nick.Welch.

Messages (1)
msg111511 - (view) Author: Nick Welch (Nick.Welch) Date: 2010-07-24 22:58
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax


Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')
History
Date User Action Args
2010-07-24 22:58:39Nick.Welchcreate