classification
Title: Improvement suggestions for urllib.parse.urlparser
Type: enhancement Stage: resolved
Components: Library (Lib) Versions:
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Ivan.Pozdeev, bsaner, r.david.murray
Priority: normal Keywords:

Created on 2018-05-13 08:11 by bsaner, last changed 2018-05-13 21:58 by r.david.murray. This issue is now closed.

Messages (3)
msg316454 - (view) Author: brent s. (bsaner) Date: 2018-05-13 08:11
Currently, a parsed urlparse() object looks (roughly) like this:

urlparse('http://example.com/foo;key1=value1?key2=value2#key3=value3#key4=value4')

returns:

ParseResult(scheme='http', netloc='example.com', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')

However, I recommend a couple things:

0.) that ParseResult objects support dict emulation. e.g. one can run:

        dict(parseresult_obj)

    and get (using the example string above (corrected classification for RFC2986 compliance and common usage):

        {'fragment': [('key4', 'value4')],
         'netloc': 'foo.tld',
         'params': [('key2', 'value2')],
         'path': '/foo',
         'query': [('key3', 'value3')],
         'scheme': 'http'}

    Obviously, fragment, params, and query could instead be serialized into a nested dict. I'm not sure which is more preferred in the pythonic sense.

1.) Better RFC3986 compliance.
    Per RFC3986 § 3 (https://tools.ietf.org/html/rfc3986#section-3), the URL can be further split into separate components. For instance, while considered deprecated, should "userinfo" (e.g. "http://user:password@...") be parsed? At the very least, the port should be parsed out to a separate component from the netloc (or userinfo parsed out separate from netloc) - this will assist in parsing host:port combinations in netlocs that contain both userinfo and a specified port (and allow the port to be given as an int type, thus more easily used in e.g. the socket lib).

2.) If a component is not present, I suggest it be a None object instead of an empty string.
    e.g.:

        urlparse('http://example.com/foo')

    Would return:

        ParseResult(scheme='http', netloc='example.com', path='/foo', params=None, query=None, fragment=None)

    instead of

        ParseResult(scheme='http', netloc='example.com', path='/foo', params='', query='', fragment='')
msg316478 - (view) Author: Ivan Pozdeev (Ivan.Pozdeev) * Date: 2018-05-13 20:29
Such drastic changes of uncertain usefulness are best discussed at python-ideas first.

What you're really asking for seems to be to parse all "levels" at the same time.
Try to think of a use case that would make that help anything practical and bring that to the list.
I fail to see any use case 'cuz you never need query parameters and things like username/port at the same time.


All else that you suggest is either already being done (username/port parsing, read the docs) or likewise has no use cases I can think of where it would make things more convenient than they already are (dict emulation, None).
msg316483 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-13 21:58
These are actually reasonable requests, and in fact have been brought up before and implemented:

>>> x = urlparse('http://me:mypass@example.com:800/foo;key1=value1?key2=value2#key3=value3#key4=value4')
>>> x
ParseResult(scheme='http', netloc='me:mypass@example.com:800', path='/foo', params='key1=value1', query='key2=value2', fragment='key3=value3#key4=value4')
>>> x.hostname
'example.com'
>>> x.port
800
>>> x.username
'me'
>>> x.password
'mypass'
>>> x._asdict()
OrderedDict([('scheme', 'http'), ('netloc', 'me:mypass@example.com:800'), ('path', '/foo'), ('params', 'key1=value1'), ('query', 'key2=value2'), ('fragment', 'key3=value3#key4=value4')])


Now, what this doesn't get you is the "extra" fields that are not part of the base tuple.  The base tuple has the members it does for backward compatibility.  So, the thing to discuss on python-ideas would be an API for namedtuple that gets you the extra fields.

None versus the empty string is not something that can happen, for backward compatibility reasons, even if there was agreement that it was better.

I'm not entirely sure why dict(x) is not supported (but I suspect it is because x is "a tuple", again for backward compatibility reasons), so you might search the archives to find out why for sure, if you  are curious.
History
Date User Action Args
2018-05-13 21:58:35r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg316483

resolution: out of date
stage: resolved
2018-05-13 20:29:44Ivan.Pozdeevsetnosy: + Ivan.Pozdeev
messages: + msg316478
2018-05-13 08:11:58bsanercreate