This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: urllib parse incorrect handing of params
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: julian.reschke@gmx.de, martin.panter, orsenthil
Priority: normal Keywords:

Created on 2015-01-02 14:13 by julian.reschke@gmx.de, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg233312 - (view) Author: Julian Reschke (julian.reschke@gmx.de) Date: 2015-01-02 14:13
urllib.parse tries to special-case params, which have been dropped from the general URI syntax back in RFC 2396 (16 years ago).

In most cases this can be worked around by reconstructing the path from both path and params; however this fails for paths that *end* in a semicolon (because it's not possible to distinguish an empty param from an absent param).
msg233342 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-01-03 03:22
Hello Julian,

Can you please provide a test case of this parsing misbehavior? It might be easier to identify with the testcase. Better yet, the patch changing the parsing logic will help identify if we are dealing with any regression.

Thanks!
msg233349 - (view) Author: Julian Reschke (julian.reschke@gmx.de) Date: 2015-01-03 08:46
An example URI for this issue is:

  http://example.com/;

The RFC 3986 path component for this URI is "/;".

After using urllib's parse function, how would you know?

(I realize that changing behavior of the existing API may cause problems, but this is an information loss issue). One ugly, but workable way to fix this would be to also provide access to a "RFC3986path" component.
msg233366 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2015-01-03 20:48
On Saturday, January 3, 2015 at 12:46 AM, Julian Reschke wrote:
> An example URI for this issue is:
> 
> http://example.com/;
> 
> The RFC 3986 path component for this URI is "/;". 
I think, a stronger argument might be desirable (something like a real world scenario wherein a web app can construct such an entity) for a path that ends in a semi-colon for breaking backwards compatibility. 

OTOH, making it RFC 3986 compliant itself is a good enough argument, but it should be applied in total and the whole module should be made compatible instead of pieces of it. There is a bug to track it. You can mention this instance for the desired behavior in that ticket too (and close this ticket if this desired behavior is a subset).
msg255556 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-28 23:06
Marking as Python 3 since you mentioned urllib.parse, rather than just urllib. However you need to be more specific. We already have a urllib.parse.urlsplit() function which seems to do what you want:

>>> urllib.parse.urlsplit("http://example.com/;").path
'/;'

I see that the “params” bit can be dropped by urljoin(). My proposal in Issue 22852 could probably be adapted to help with that.
msg271714 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-07-31 00:06
If the problem was just Julian not being aware of urlsplit(), there is not much to be done for this bug.
History
Date User Action Args
2022-04-11 14:58:11adminsetgithub: 67339
2017-03-07 18:55:49serhiy.storchakasetstatus: pending -> closed
stage: test needed -> resolved
2016-07-31 00:06:32martin.pantersetstatus: open -> pending
resolution: not a bug
messages: + msg271714
2015-11-28 23:06:46martin.pantersetversions: + Python 3.4, Python 3.5, Python 3.6
nosy: + martin.panter

messages: + msg255556

stage: test needed
2015-01-03 20:48:25orsenthilsetmessages: + msg233366
2015-01-03 08:46:54julian.reschke@gmx.desetmessages: + msg233349
2015-01-03 03:22:01orsenthilsetnosy: + orsenthil
messages: + msg233342
2015-01-02 14:13:47julian.reschke@gmx.decreate