classification
Title: urlparse should parse query and fragment for arbitrary schemes
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: orsenthil Nosy List: Arfrever, Nick.Welch, benjamin.peterson, doko, eric.araujo, ezio.melotti, georg.brandl, orsenthil, pitrou, python-dev, vadmium
Priority: critical Keywords:

Created on 2010-07-24 22:58 by Nick.Welch, last changed 2014-04-14 22:48 by orsenthil. This issue is now closed.

Messages (16)
msg111511 - (view) Author: Nick Welch (Nick.Welch) Date: 2010-07-24 22:58
While the netloc/path parts of URLs are scheme-specific, and urlparse can be forgiven for refusing to parse them for unknown schemes, the query and fragment parts are standardized, and should be parsed for unrecognized schemes.

According to Wikipedia:
------------------
Internet standard STD 66 (also RFC 3986) defines the generic syntax to be used in all URI schemes. Every URI is defined as consisting of four parts, as follows:
<scheme name> : <hierarchical part> [ ? <query> ] [ # <fragment> ]
------------------
http://en.wikipedia.org/wiki/URI_scheme#Generic_syntax


Here is a demonstration of what urlparse currently does:

>>> urlparse.urlsplit('myscheme://netloc/path?a=b#frag')
SplitResult(scheme='myscheme', netloc='', path='//netloc/path?a=b#frag', query='', fragment='')

>>> urlparse.urlsplit('http://netloc/path?a=b#frag')
SplitResult(scheme='http', netloc='netloc', path='/path', query='a=b', fragment='frag')
msg161087 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-19 00:13
New changeset 79e6ff3d9afd by Senthil Kumaran in branch '2.7':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/79e6ff3d9afd

New changeset a9d43e21f7d8 by Senthil Kumaran in branch '3.2':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/a9d43e21f7d8

New changeset 152c78b94e41 by Senthil Kumaran in branch 'default':
Issue9374 - Generic parsing of query and fragment portion of urls for any scheme
http://hg.python.org/cpython/rev/152c78b94e41
msg161088 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-05-19 00:16
Thanks for raising this issue, Nick. Yes, I verified in both RFC 3986 and 2396 and realized we can safely adopt a generic parsing system for query and fragment portions of the urls for any scheme. Since it was supported in earlier versions too, I felt it was good move to backport too.
Fixed in all versions. 

Thanks!
msg165546 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-07-15 20:06
Removing the module attributes causes third-party code to break.  See one example here: http://lists.idyll.org/pipermail/testing-in-python/2012-July/005082.html
msg165547 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-07-15 20:07
Better link: https://github.com/pypa/pip/issues/552
msg168899 - (view) Author: Matthias Klose (doko) * (Python committer) Date: 2012-08-22 17:29
this breaks the following upstream builds:

createrepo, linkchecker, gwibber, pegasus-wm

there is no need to remove is_hierarchical on the branches. it's not used by urlparse at all.

is it safe to just keep the uses_query and uses_fragment lists on the branches as well?

raising to a release blocker, I consider this as a regression for the 2.7 and 3.2 release series.
msg169039 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-08-24 16:12
Senthil, either the module globals should be re-added for compatibility, or the commits should be reverted, IMO.
msg169040 - (view) Author: Roundup Robot (python-dev) Date: 2012-08-24 16:17
New changeset a0b3cb52816e by Georg Brandl in branch '3.2':
Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
http://hg.python.org/cpython/rev/a0b3cb52816e

New changeset c93fbc2caba5 by Georg Brandl in branch 'default':
Closes #9374: merge with 3.2
http://hg.python.org/cpython/rev/c93fbc2caba5

New changeset a43481210964 by Georg Brandl in branch '2.7':
Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
http://hg.python.org/cpython/rev/a43481210964
msg169052 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-08-24 17:23
Oops. I had not seen Eric and Mattiahs comment to this issue, which
pointed out to the problem. Sorry for not acting on this.

Thanks Georg for adding those module attributes back.

On Fri, Aug 24, 2012 at 9:17 AM, Roundup Robot <report@bugs.python.org> wrote:
>
> Roundup Robot added the comment:
>
> New changeset a0b3cb52816e by Georg Brandl in branch '3.2':
> Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
> http://hg.python.org/cpython/rev/a0b3cb52816e
>
> New changeset c93fbc2caba5 by Georg Brandl in branch 'default':
> Closes #9374: merge with 3.2
> http://hg.python.org/cpython/rev/c93fbc2caba5
>
> New changeset a43481210964 by Georg Brandl in branch '2.7':
> Closes #9374: add back now-unused module attributes; removing them is a backward compatibility issue, since they have a public-seeming name.
> http://hg.python.org/cpython/rev/a43481210964
>
> ----------
> resolution: remind -> fixed
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue9374>
> _______________________________________
msg171448 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-09-28 12:28
After encountering an instance of people relying on fragment not being parsed for "irc://" URLs, with resulting breakage, I don't think we should change this in point releases.  IOW, it's fine for 3.3.0, but not for 2.7.x or 3.2.x.

It may be fixing a bug, but the bug is not obvious and the fix is not backward compatible.  I therefore suggest to roll back the commits to 3.2 and 2.7.
msg171452 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-09-28 12:47
If there is a list of known protocols that don't use the fragment, can't we include it in urlparse as we already do in Lib/urlparse.py:34?
If #channel in irc://example.com/#channel should not be parsed as fragment, then this can be considered as a regression.  This doesn't necessary mean that the whole change is a regression though.
msg171465 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2012-09-28 13:40
People make up URL schemes all the time, irc:// is not a special case. This change will mean breakage for them, unwarranted.
msg171469 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-09-28 14:05
One would hope that people making up URI schemes would follow the generic syntax (and thus irc would be an exception), but as the risk exists I agree we should not break code in bugfix releases.
msg171557 - (view) Author: Roundup Robot (python-dev) Date: 2012-09-29 07:27
New changeset 950320c70fb4 by Georg Brandl in branch 'default':
Add a versionchanged note for #9374 changes.
http://hg.python.org/cpython/rev/950320c70fb4
msg179270 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-07 16:46
> It may be fixing a bug, but the bug is not obvious and the fix is not
> backward compatible.  I therefore suggest to roll back the commits to
> 3.2 and 2.7.

Well, the bug is quite obvious to me :-) (just hit it here)
The fix for those who want the old behaviour is obvious: just pass `allow_fragments=False` to urlparse(). OTOH, if you revert the fix, patching things manually is quite cumbersome.
msg216244 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2014-04-14 22:48
Reviewed the issue and correct rollbacks and commits were applied.
This ticket should be closed. Thanks!
History
Date User Action Args
2014-04-14 22:48:03orsenthilsetstatus: open -> closed

messages: + msg216244
2013-11-24 01:49:08vadmiumsetnosy: + vadmium
2013-01-07 16:46:02pitrousetmessages: + msg179270
2012-09-29 07:27:40python-devsetmessages: + msg171557
2012-09-29 06:59:41georg.brandlsetversions: - Python 3.3
2012-09-28 14:05:18eric.araujosetmessages: + msg171469
2012-09-28 13:40:04georg.brandlsetmessages: + msg171465
2012-09-28 12:47:22ezio.melottisetmessages: + msg171452
2012-09-28 12:28:58georg.brandlsetpriority: release blocker -> critical
status: closed -> open
messages: + msg171448
2012-08-24 17:23:21orsenthilsetmessages: + msg169052
2012-08-24 16:17:37python-devsetstatus: open -> closed
resolution: remind -> fixed
messages: + msg169040
2012-08-24 16:12:59pitrousetnosy: + pitrou
messages: + msg169039
2012-08-22 17:29:38dokosetpriority: normal -> release blocker

nosy: + benjamin.peterson, georg.brandl, doko
messages: + msg168899

resolution: fixed -> remind
2012-07-15 20:07:21eric.araujosetmessages: + msg165547
2012-07-15 20:06:39eric.araujosetstatus: closed -> open

messages: + msg165546
2012-06-14 16:10:11Arfreversetnosy: + Arfrever
2012-05-19 00:16:23orsenthilsetstatus: open -> closed
messages: + msg161088

assignee: orsenthil
resolution: fixed
stage: needs patch -> resolved
2012-05-19 00:13:01python-devsetnosy: + python-dev
messages: + msg161087
2012-05-08 03:29:53eric.araujosetnosy: + orsenthil, eric.araujo
2012-05-06 22:50:02ezio.melottisetnosy: + ezio.melotti
stage: needs patch

versions: + Python 2.7, Python 3.2, Python 3.3, - Python 2.6
2010-07-24 22:58:39Nick.Welchcreate