This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lincolnauster
Recipients lincolnauster
Date 2022-01-10.21:50:56
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1641851456.34.0.342855463339.issue46337@roundup.psfhosted.org>
In-reply-to
Content
It looks like this was discussed in 2013-2015 here: https://bugs.python.org/issue18828

Basically, with all the URL schemes that exist in the world (and the possibility of a custom scheme), the current strategy of enumerating what do what in a hard-coded variable is a bit ... weird. Among the proposed solutions in 18828, some were:

+ Have a global registry of what schemes do what (criticized for being overkill, and I can't say I disagree)
+ Get rid of the scheme lists altogether, and assume every scheme supports everything (isn't backwards compatible; might break with intended behavior, too).
+ Switch the use_relative whitelist to a blacklist: (maybe fine in practice, maybe not; either way it doesn't really fix the underlying issue)
+ Work around it with global state (modify the uses_* lists; this is what I'm doing in my code, and I can't say I like it much).

An alternative implemented I've implemented in my fork (https://github.com/lincolnauster/cpython/tree/urllib-custom-schemes) is to have an Enum with all the weird scheme-based behaviors that may occur (urllib.parse.SchemeClass in my fork) and allow passing a set of those Enums to functions relying on scheme-specific behavior, and adding all the elements of that set to what's been determined by the scheme. (See the test case for a concrete example; this explanation is not great).

Some things I like about this:
+ Backwards compatibility.
+ It makes the functions using it as a general strategy a bit more pure.
+ It makes client code deal with edge cases.

Some things that could be changed:
+ There's no way to remove behaviors you *don't* want.
+ It makes client code deal with edge cases.

As a side thought: if the above could be adopted, the uses_* lists could be enforced as immutable, which, while breaking compatibility, could make client code a bit cleaner.
History
Date User Action Args
2022-01-10 21:50:56lincolnaustersetrecipients: + lincolnauster
2022-01-10 21:50:56lincolnaustersetmessageid: <1641851456.34.0.342855463339.issue46337@roundup.psfhosted.org>
2022-01-10 21:50:56lincolnausterlinkissue46337 messages
2022-01-10 21:50:56lincolnaustercreate