Issue 33480: Improvement suggestions for urllib.parse.urlparser

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77661

classification

Title:	Improvement suggestions for urllib.parse.urlparser
Type:	enhancement	Stage:	resolved
Components:	Library (Lib)	Versions:

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	Ivan.Pozdeev, bsaner, r.david.murray
Priority:	normal	Keywords:

Created on 2018-05-13 08:11 by bsaner, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg316454 - (view)	Author: brent s. (bsaner)	Date: 2018-05-13 08:11
Currently, a parsed urlparse() object looks (roughly) like this: urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4') returns: ParseResult(scheme='http', netloc='example.com', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4') However, I recommend a couple things: 0.) that ParseResult objects support dict emulation. e.g. one can run: dict(parseresult_obj) and get (using the example string above (corrected classification for RFC2986 compliance and common usage): {'fragment': [('key4', 'value4')], 'netloc': 'foo.tld', 'params': [('key2', 'value2')], 'path': '/foo', 'query': [('key3', 'value3')], 'scheme': 'http'} Obviously, fragment, params, and query could instead be serialized into a nested dict. I'm not sure which is more preferred in the pythonic sense. 1.) Better RFC3986 compliance. Per RFC3986 § 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL can be further split into separate components. For instance, while considered deprecated, should "userinfo" (e.g. "http://user:password@...") be parsed? At the very least, the port should be parsed out to a separate component from the netloc (or userinfo parsed out separate from netloc) - this will assist in parsing host:port combinations in netlocs that contain both userinfo and a specified port (and allow the port to be given as an int type, thus more easily used in e.g. the socket lib). 2.) If a component is not present, I suggest it be a None object instead of an empty string. e.g.: urlparse('http://example.com/foo') Would return: ParseResult(scheme='http', netloc='example.com', path='/foo', params=None, query=None, fragment=None) instead of ParseResult(scheme='http', netloc='example.com', path='/foo', params='', query='', fragment='')
msg316478 - (view)	Author: Ivan Pozdeev (Ivan.Pozdeev) *	Date: 2018-05-13 20:29
Such drastic changes of uncertain usefulness are best discussed at python-ideas first. What you're really asking for seems to be to parse all "levels" at the same time. Try to think of a use case that would make that help anything practical and bring that to the list. I fail to see any use case 'cuz you never need query parameters and things like username/port at the same time. All else that you suggest is either already being done (username/port parsing, read the docs) or likewise has no use cases I can think of where it would make things more convenient than they already are (dict emulation, None).
msg316483 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-05-13 21:58
These are actually reasonable requests, and in fact have been brought up before and implemented: >>> x = urlparse('http://me:mypass@example.com:800/foo;key1=value1?key2=value2#key3=value3#key4=value4') >>> x ParseResult(scheme='http', netloc='me:mypass@example.com:800', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4') >>> x.hostname 'example.com' >>> x.port 800 >>> x.username 'me' >>> x.password 'mypass' >>> x._asdict() OrderedDict([('scheme', 'http'), ('netloc', 'me:mypass@example.com:800'), ('path', '/foo'), ('params', 'key1=value1'), ('query', 'key2=value2'), ('fragment', 'key3=value3#key4=value4')]) Now, what this doesn't get you is the "extra" fields that are not part of the base tuple. The base tuple has the members it does for backward compatibility. So, the thing to discuss on python-ideas would be an API for namedtuple that gets you the extra fields. None versus the empty string is not something that can happen, for backward compatibility reasons, even if there was agreement that it was better. I'm not entirely sure why dict(x) is not supported (but I suspect it is because x is "a tuple", again for backward compatibility reasons), so you might search the archives to find out why for sure, if you are curious.

History
Date	User	Action	Args
2022-04-11 14:59:00	admin	set	github: 77661
2018-05-13 21:58:35	r.david.murray	set	status: open -> closed nosy: + r.david.murray messages: + msg316483 resolution: out of date stage: resolved
2018-05-13 20:29:44	Ivan.Pozdeev	set	nosy: + Ivan.Pozdeev messages: + msg316478
2018-05-13 08:11:58	bsaner	create