Title: urllib.parse should not discard delimiters when associated component is empty
Dependencies: Superseder: urllib.parse wrongly strips empty #fragment, ?query, //netloc
Created on 2015-05-30 18:05 by gdata gmail, last changed 2022-04-11 14:58 by admin. This issue is now closed.

msg244477 - (view) Author: gdata gmail (gdata gmail) Date: 2015-05-30 18:05
The documenatation for urllib.parse ( states several times:

"This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent)."

This is false -- RFC 3986 explicitly states that ? with an empty query is _not_ equivalent to a URL without it.  For example, the following two URL's should be considered different:
msg244515 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-05-31 04:34
This is essentially the same as Issue 22852. The title just refers to stripping an empty #fragment, but the netloc and query components are also affected. I have a patch there which needs reviewing, if you are interested. Or if you have any alternative ideas on how to solve this they would be welcome too.
