classification
Title: Normalization error in urlunparse
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.4, Python 3.5, Python 2.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
View: 22852
Assigned To: orsenthil Nosy List: Aaron1011, BreamoreBoy, dstanek, eric.araujo, martin.panter, orsenthil
Priority: normal Keywords:

Created on 2009-04-25 19:12 by eric.araujo, last changed 2015-05-31 04:25 by martin.panter. This issue is now closed.

Messages (6)
msg86538 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2009-04-25 19:12
Docstring for urlunparse says:
    """Put a parsed URI back together again.  This may result in a
    slightly different, but equivalent URI, if the URI that was parsed
    originally had redundant delimiters, e.g. a ? with an empty query
    (the draft states that these are equivalent)."""

“Draft” here refers to RFC 1808, superseded by 3986. However, RFC 3986
(section 6.2.3) states:
“Normalization should not remove delimiters when their associated
component is empty unless licensed to do so by the scheme  
specification.  For example, the URI "http://example.com/?" cannot be  
 assumed to be equivalent to any of the examples above.  Likewise, the 
  presence or absence of delimiters within a userinfo subcomponent is  
 usually significant to its interpretation.  The fragment component is 
  not subject to any scheme-based normalization; thus, two URIs that   
differ only by the suffix "#" are considered different regardless of   
the scheme.”

I guess we need some tests here to check compliance.
msg86541 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2009-04-25 19:45
This is indeed a bug. urlunparse should special-case "#" so as not to
discard it.
msg110314 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2010-07-14 19:09
Currently this claim will fail:

>>> obj = urlparse.urlparse('http://a/b/c?')
>>> urlparse.urlunparse(obj)
'http://a/b/c'
>>> obj = urlparse.urlparse('http://a/b/c#')
>>> urlparse.urlunparse(obj)
'http://a/b/c'

If we move away from the current behavior, there will surely be some test failures that can be observed for urljoins. We will have to consider those cases too while fixing this.
msg228009 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-09-30 21:45
Slipped under the radar guys?
msg228853 - (view) Author: Aaron Hill (Aaron1011) * Date: 2014-10-09 10:21
In order to fix this, I think ParseResult needs to have two additional fields, indicating with an empty prefix or query string are used.

Both ParseResult.fragment and ParseResult.query omit the leading '#' or '?' from their value. This makes it impossible to determine if the fragment/query string is entirely absent, or has no value.
msg235579 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-09 00:17
Looks like this duplicates Issue 22852, which has a patch, although its author had second thoughts on the implementation
History
Date User Action Args
2015-05-31 04:25:46martin.pantersetstatus: open -> closed
superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
resolution: duplicate
stage: resolved
2015-02-09 00:17:34martin.pantersetnosy: + martin.panter
messages: + msg235579
2014-10-09 10:21:51Aaron1011setnosy: + Aaron1011
messages: + msg228853
2014-09-30 21:45:28BreamoreBoysetnosy: + BreamoreBoy

messages: + msg228009
versions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2
2010-11-02 19:36:38eric.araujosetnosy: orsenthil, dstanek, eric.araujo
title: Possible normalization error in urlparse.urlunparse -> Normalization error in urlunparse
components: + Library (Lib)
versions: + Python 3.1, Python 2.7, Python 3.2
2010-08-18 00:15:17dstaneksetnosy: + dstanek
2010-07-14 19:09:30orsenthilsetmessages: + msg110314
2010-07-11 14:28:57eric.araujosetassignee: orsenthil

type: behavior
nosy: + orsenthil
2009-04-25 19:45:10eric.araujosetmessages: + msg86541
2009-04-25 19:12:38eric.araujocreate