Message 334793 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jaraco
Recipients	jaraco
Date	2019-02-03.15:10:59
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1549206660.03.0.379337447594.issue35891@roundup.psfhosted.org>
In-reply-to

Content
The removal of splituser (issue27485) has the undesirable effect of leaving the programmer without a suitable alternative. The deprecation warning states to use `urlparse` instead, but `urlparse` doesn't provide the access to the `credential` or `address` components of a URL. Consider for example: >>> import urllib.parse >>> url = 'https://user:password@host:port/path' >>> parsed = urllib.parse.urlparse(url) >>> urllib.parse.splituser(parsed.netloc) ('user:password', 'host:port') It's not readily obvious how one might get those two values, the credential and the address, from `parsed`. Sure, you can get `username` and `password`. You can get `hostname` and `port`. But if what you want is to remove the credential and keep the address, or extract the credential and pass it unchanged as a single string to something like an `_encode_auth` handler, that's no longer possible without some careful handling--because of possible None values, re-assembling a username/password into a colon-separated string is more complicated than simply doing a ':'.join. This recommendation and limitation led to issues in production code and ultimately the inline adoption of the deprecated function, [summarized here](https://github.com/pypa/setuptools/pull/1670). I believe if splituser is to be deprecated, the netloc should provide a suitable alternative - namely that a `urlparse` result should supply `address` and `userinfo`. Such functionality would make it easier to transition code that currently relies on splituser for more than to parse out the username and password. Even better would be for the urlparse result to support `_replace` operations on these attributes... so that one wouldn't have to construct a netloc just to construct a URL that replaces only some portion of the netloc, so one could do something like: >>> parsed = urllib.parse.urlparse(url) >>> without_userinfo = parsed._replace(userinfo=None).geturl() >>> alt_port = parsed._replace(port=443).geturl() I realize that because of the nesting of abstractions (namedtuple for the main parts), that maybe this technique doesn't extend nicely, so maybe the netloc itself should provide this extensibility for a usage something like this: >>> parsed = urllib.parse.urlparse(url) >>> without_userinfo = parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl() >>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl() It's not as elegant, but likely simpler to implement, with netloc being extended with a _replace method to support replacing segments of itself (and still immutable)... and is dramatically less error-prone than the status quo without splituser. In any case, I don't think it's suitable to leave it to the programmer to have to muddle around with their own URL parsing logic. urllib.parse should provide some help here.

The removal of splituser (issue27485) has the undesirable effect of leaving the programmer without a suitable alternative. The deprecation warning states to use `urlparse` instead, but `urlparse` doesn't provide the access to the `credential` or `address` components of a URL.

Consider for example:

>>> import urllib.parse
>>> url = 'https://user:password@host:port/path'
>>> parsed = urllib.parse.urlparse(url)
>>> urllib.parse.splituser(parsed.netloc)
('user:password', 'host:port')

It's not readily obvious how one might get those two values, the credential and the address, from `parsed`. Sure, you can get `username` and `password`. You can get `hostname` and `port`. But if what you want is to remove the credential and keep the address, or extract the credential and pass it unchanged as a single string to something like an `_encode_auth` handler, that's no longer possible without some careful handling--because of possible None values, re-assembling a username/password into a colon-separated string is more complicated than simply doing a ':'.join.

This recommendation and limitation led to issues in production code and ultimately the inline adoption of the deprecated function, [summarized here](https://github.com/pypa/setuptools/pull/1670).

I believe if splituser is to be deprecated, the netloc should provide a suitable alternative - namely that a `urlparse` result should supply `address` and `userinfo`. Such functionality would make it easier to transition code that currently relies on splituser for more than to parse out the username and password.

Even better would be for the urlparse result to support `_replace` operations on these attributes... so that one wouldn't have to construct a netloc just to construct a URL that replaces only some portion of the netloc, so one could do something like:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(userinfo=None).geturl()
>>> alt_port = parsed._replace(port=443).geturl()

I realize that because of the nesting of abstractions (namedtuple for the main parts), that maybe this technique doesn't extend nicely, so maybe the netloc itself should provide this extensibility for a usage something like this:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl()
>>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl()


It's not as elegant, but likely simpler to implement, with netloc being extended with a _replace method to support replacing segments of itself (and still immutable)... and is dramatically less error-prone than the status quo without splituser.

In any case, I don't think it's suitable to leave it to the programmer to have to muddle around with their own URL parsing logic. urllib.parse should provide some help here.

History
Date	User	Action	Args
2019-02-03 15:11:03	jaraco	set	recipients: + jaraco
2019-02-03 15:11:00	jaraco	set	messageid: <1549206660.03.0.379337447594.issue35891@roundup.psfhosted.org>
2019-02-03 15:10:59	jaraco	link	issue35891 messages
2019-02-03 15:10:59	jaraco	create