Issue 5843: Normalization error in urlunparse

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/50093

classification

Title:	Normalization error in urlunparse
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.4, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:	urllib.parse wrongly strips empty #fragment, ?query, //netloc View: 22852
Assigned To:	orsenthil	Nosy List:	Aaron1011, BreamoreBoy, dstanek, eric.araujo, martin.panter, orsenthil
Priority:	normal	Keywords:

Created on 2009-04-25 19:12 by eric.araujo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg86538 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2009-04-25 19:12
Docstring for urlunparse says: """Put a parsed URI back together again. This may result in a slightly different, but equivalent URI, if the URI that was parsed originally had redundant delimiters, e.g. a ? with an empty query (the draft states that these are equivalent).""" “Draft” here refers to RFC 1808, superseded by 3986. However, RFC 3986 (section 6.2.3) states: “Normalization should not remove delimiters when their associated component is empty unless licensed to do so by the scheme specification. For example, the URI "http://example.com/?" cannot be assumed to be equivalent to any of the examples above. Likewise, the presence or absence of delimiters within a userinfo subcomponent is usually significant to its interpretation. The fragment component is not subject to any scheme-based normalization; thus, two URIs that differ only by the suffix "#" are considered different regardless of the scheme.” I guess we need some tests here to check compliance.
msg86541 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2009-04-25 19:45
This is indeed a bug. urlunparse should special-case "#" so as not to discard it.
msg110314 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2010-07-14 19:09
Currently this claim will fail: >>> obj = urlparse.urlparse('http://a/b/c?') >>> urlparse.urlunparse(obj) 'http://a/b/c' >>> obj = urlparse.urlparse('http://a/b/c#') >>> urlparse.urlunparse(obj) 'http://a/b/c' If we move away from the current behavior, there will surely be some test failures that can be observed for urljoins. We will have to consider those cases too while fixing this.
msg228009 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2014-09-30 21:45
Slipped under the radar guys?
msg228853 - (view)	Author: Aaron Hill (Aaron1011) *	Date: 2014-10-09 10:21
In order to fix this, I think ParseResult needs to have two additional fields, indicating with an empty prefix or query string are used. Both ParseResult.fragment and ParseResult.query omit the leading '#' or '?' from their value. This makes it impossible to determine if the fragment/query string is entirely absent, or has no value.
msg235579 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-02-09 00:17
Looks like this duplicates Issue 22852, which has a patch, although its author had second thoughts on the implementation

History
Date	User	Action	Args
2022-04-11 14:56:48	admin	set	github: 50093
2015-05-31 04:25:46	martin.panter	set	status: open -> closed superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc resolution: duplicate stage: resolved
2015-02-09 00:17:34	martin.panter	set	nosy: + martin.panter messages: + msg235579
2014-10-09 10:21:51	Aaron1011	set	nosy: + Aaron1011 messages: + msg228853
2014-09-30 21:45:28	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg228009 versions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2
2010-11-02 19:36:38	eric.araujo	set	nosy: orsenthil, dstanek, eric.araujo title: Possible normalization error in urlparse.urlunparse -> Normalization error in urlunparse components: + Library (Lib) versions: + Python 3.1, Python 2.7, Python 3.2
2010-08-18 00:15:17	dstanek	set	nosy: + dstanek
2010-07-14 19:09:30	orsenthil	set	messages: + msg110314
2010-07-11 14:28:57	eric.araujo	set	assignee: orsenthil type: behavior nosy: + orsenthil
2009-04-25 19:45:10	eric.araujo	set	messages: + msg86541
2009-04-25 19:12:38	eric.araujo	create