Issue 23150: urllib parse incorrect handing of params

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/67339

classification

Title:	urllib parse incorrect handing of params
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.6, Python 3.4, Python 3.5

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	julian.reschke@gmx.de, martin.panter, orsenthil
Priority:	normal	Keywords:

Created on 2015-01-02 14:13 by julian.reschke@gmx.de, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (6)
msg233312 - (view)	Author: Julian Reschke (julian.reschke@gmx.de)	Date: 2015-01-02 14:13
urllib.parse tries to special-case params, which have been dropped from the general URI syntax back in RFC 2396 (16 years ago). In most cases this can be worked around by reconstructing the path from both path and params; however this fails for paths that end in a semicolon (because it's not possible to distinguish an empty param from an absent param).
msg233342 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2015-01-03 03:22
Hello Julian, Can you please provide a test case of this parsing misbehavior? It might be easier to identify with the testcase. Better yet, the patch changing the parsing logic will help identify if we are dealing with any regression. Thanks!
msg233349 - (view)	Author: Julian Reschke (julian.reschke@gmx.de)	Date: 2015-01-03 08:46
An example URI for this issue is: http://example.com/; The RFC 3986 path component for this URI is "/;". After using urllib's parse function, how would you know? (I realize that changing behavior of the existing API may cause problems, but this is an information loss issue). One ugly, but workable way to fix this would be to also provide access to a "RFC3986path" component.
msg233366 - (view)	Author: Senthil Kumaran (orsenthil) *	Date: 2015-01-03 20:48
On Saturday, January 3, 2015 at 12:46 AM, Julian Reschke wrote: > An example URI for this issue is: > > http://example.com/; > > The RFC 3986 path component for this URI is "/;". I think, a stronger argument might be desirable (something like a real world scenario wherein a web app can construct such an entity) for a path that ends in a semi-colon for breaking backwards compatibility. OTOH, making it RFC 3986 compliant itself is a good enough argument, but it should be applied in total and the whole module should be made compatible instead of pieces of it. There is a bug to track it. You can mention this instance for the desired behavior in that ticket too (and close this ticket if this desired behavior is a subset).
msg255556 - (view)	Author: Martin Panter (martin.panter) *	Date: 2015-11-28 23:06
Marking as Python 3 since you mentioned urllib.parse, rather than just urllib. However you need to be more specific. We already have a urllib.parse.urlsplit() function which seems to do what you want: >>> urllib.parse.urlsplit("http://example.com/;").path '/;' I see that the “params” bit can be dropped by urljoin(). My proposal in Issue 22852 could probably be adapted to help with that.
msg271714 - (view)	Author: Martin Panter (martin.panter) *	Date: 2016-07-31 00:06
If the problem was just Julian not being aware of urlsplit(), there is not much to be done for this bug.

History
Date	User	Action	Args
2022-04-11 14:58:11	admin	set	github: 67339
2017-03-07 18:55:49	serhiy.storchaka	set	status: pending -> closed stage: test needed -> resolved
2016-07-31 00:06:32	martin.panter	set	status: open -> pending resolution: not a bug messages: + msg271714
2015-11-28 23:06:46	martin.panter	set	versions: + Python 3.4, Python 3.5, Python 3.6 nosy: + martin.panter messages: + msg255556 stage: test needed
2015-01-03 20:48:25	orsenthil	set	messages: + msg233366
2015-01-03 08:46:54	julian.reschke@gmx.de	set	messages: + msg233349
2015-01-03 03:22:01	orsenthil	set	nosy: + orsenthil messages: + msg233342
2015-01-02 14:13:47	julian.reschke@gmx.de	create