Issue 35891: urllib.parse.splituser has no suitable replacement

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/80072

classification

Title:	urllib.parse.splituser has no suitable replacement
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jaraco
Priority:	normal	Keywords:

Created on 2019-02-03 15:11 by jaraco, last changed 2022-04-11 14:59 by admin.

Messages (1)
msg334793 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2019-02-03 15:10
The removal of splituser (issue27485) has the undesirable effect of leaving the programmer without a suitable alternative. The deprecation warning states to use `urlparse` instead, but `urlparse` doesn't provide the access to the `credential` or `address` components of a URL. Consider for example: >>> import urllib.parse >>> url = 'https://user:password@host:port/path' >>> parsed = urllib.parse.urlparse(url) >>> urllib.parse.splituser(parsed.netloc) ('user:password', 'host:port') It's not readily obvious how one might get those two values, the credential and the address, from `parsed`. Sure, you can get `username` and `password`. You can get `hostname` and `port`. But if what you want is to remove the credential and keep the address, or extract the credential and pass it unchanged as a single string to something like an `_encode_auth` handler, that's no longer possible without some careful handling--because of possible None values, re-assembling a username/password into a colon-separated string is more complicated than simply doing a ':'.join. This recommendation and limitation led to issues in production code and ultimately the inline adoption of the deprecated function, [summarized here](https://github.com/pypa/setuptools/pull/1670). I believe if splituser is to be deprecated, the netloc should provide a suitable alternative - namely that a `urlparse` result should supply `address` and `userinfo`. Such functionality would make it easier to transition code that currently relies on splituser for more than to parse out the username and password. Even better would be for the urlparse result to support `_replace` operations on these attributes... so that one wouldn't have to construct a netloc just to construct a URL that replaces only some portion of the netloc, so one could do something like: >>> parsed = urllib.parse.urlparse(url) >>> without_userinfo = parsed._replace(userinfo=None).geturl() >>> alt_port = parsed._replace(port=443).geturl() I realize that because of the nesting of abstractions (namedtuple for the main parts), that maybe this technique doesn't extend nicely, so maybe the netloc itself should provide this extensibility for a usage something like this: >>> parsed = urllib.parse.urlparse(url) >>> without_userinfo = parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl() >>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl() It's not as elegant, but likely simpler to implement, with netloc being extended with a _replace method to support replacing segments of itself (and still immutable)... and is dramatically less error-prone than the status quo without splituser. In any case, I don't think it's suitable to leave it to the programmer to have to muddle around with their own URL parsing logic. urllib.parse should provide some help here.

msg334793 - (view)

Author: Jason R. Coombs (jaraco) * (Python committer)

Date: 2019-02-03 15:10

The removal of splituser (issue27485) has the undesirable effect of leaving the programmer without a suitable alternative. The deprecation warning states to use `urlparse` instead, but `urlparse` doesn't provide the access to the `credential` or `address` components of a URL.

Consider for example:

>>> import urllib.parse
>>> url = 'https://user:password@host:port/path'
>>> parsed = urllib.parse.urlparse(url)
>>> urllib.parse.splituser(parsed.netloc)
('user:password', 'host:port')

It's not readily obvious how one might get those two values, the credential and the address, from `parsed`. Sure, you can get `username` and `password`. You can get `hostname` and `port`. But if what you want is to remove the credential and keep the address, or extract the credential and pass it unchanged as a single string to something like an `_encode_auth` handler, that's no longer possible without some careful handling--because of possible None values, re-assembling a username/password into a colon-separated string is more complicated than simply doing a ':'.join.

This recommendation and limitation led to issues in production code and ultimately the inline adoption of the deprecated function, [summarized here](https://github.com/pypa/setuptools/pull/1670).

I believe if splituser is to be deprecated, the netloc should provide a suitable alternative - namely that a `urlparse` result should supply `address` and `userinfo`. Such functionality would make it easier to transition code that currently relies on splituser for more than to parse out the username and password.

Even better would be for the urlparse result to support `_replace` operations on these attributes... so that one wouldn't have to construct a netloc just to construct a URL that replaces only some portion of the netloc, so one could do something like:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(userinfo=None).geturl()
>>> alt_port = parsed._replace(port=443).geturl()

I realize that because of the nesting of abstractions (namedtuple for the main parts), that maybe this technique doesn't extend nicely, so maybe the netloc itself should provide this extensibility for a usage something like this:

>>> parsed = urllib.parse.urlparse(url)
>>> without_userinfo = parsed._replace(netloc=parsed.netloc._replace(userinfo=None)).geturl()
>>> alt_port = parsed._replace(netloc=parsed.netloc._replace(port=443)).geturl()


It's not as elegant, but likely simpler to implement, with netloc being extended with a _replace method to support replacing segments of itself (and still immutable)... and is dramatically less error-prone than the status quo without splituser.

In any case, I don't think it's suitable to leave it to the programmer to have to muddle around with their own URL parsing logic. urllib.parse should provide some help here.

History
Date	User	Action	Args
2022-04-11 14:59:10	admin	set	github: 80072
2019-02-03 15:11:09	jaraco	set	versions: + Python 3.8
2019-02-03 15:11:00	jaraco	create