-
-
Notifications
You must be signed in to change notification settings - Fork 29.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urllib.parse: Allow more flexibility in schemes and URL resolution behavior #90495
Comments
It looks like this was discussed in 2013-2015 here: https://bugs.python.org/issue18828 Basically, with all the URL schemes that exist in the world (and the possibility of a custom scheme), the current strategy of enumerating what do what in a hard-coded variable is a bit ... weird. Among the proposed solutions in 18828, some were: + Have a global registry of what schemes do what (criticized for being overkill, and I can't say I disagree) An alternative implemented I've implemented in my fork (https://github.com/lincolnauster/cpython/tree/urllib-custom-schemes) is to have an Enum with all the weird scheme-based behaviors that may occur (urllib.parse.SchemeClass in my fork) and allow passing a set of those Enums to functions relying on scheme-specific behavior, and adding all the elements of that set to what's been determined by the scheme. (See the test case for a concrete example; this explanation is not great). Some things I like about this: Some things that could be changed: As a side thought: if the above could be adopted, the uses_* lists could be enforced as immutable, which, while breaking compatibility, could make client code a bit cleaner. |
I remember a discussion about this years ago. |
If I'm understanding you right, that's what this (and the PR) is - an |
In my idea it would not be a list of things that you have to pass piecemeal to request specific behaviour, but another function or a new param (like We could even handle things like bpo-22852 in that mode (although ironically, correct behaviour for that requires having a registry of schemes). |
If I'm correct in my understanding of a universal parse function (a Do we think a parse_universal function would be helpful to add on top of |
Just to note that there is a maintained list of officially accepted schemes at IANA. In addition there is a list of unofficial schemes on wikipedia |
Éric Araujo wrote on PR30520:
I suspect the usefulness comes from error checking -- if a scheme doesn't support parameters, then having what looks like parameters converted would not be helpful. Further, while a new function is definitely safer, how many parse options do we need? Anyone else remember Assuming we just enhance the existing function, would it be more palatable if there was a class SchemeFlag(Flag):
RELATIVE = auto()
NETLOC = auto()
PARAMS = auto()
UNIVERSAL = RELATIVE | NETLOC | PARAMS
#
def __repr__(self):
return f"{self.module}.{self._name_}"
__str__ = __repr__
RELATIVE, NETLOC, PARAMS, UNIVERSAL = SchemeFlag Then the above call becomes: urlparse(uri_string, flags=UNIVERSAL) |
I would like to know what Senthil is thinking before the PR with options à la carte are merged! |
Sounds good. |
I will review this in a day. |
Hi all, I was looking at it. Introducing an enum at the last parameter is going to add cost of understanding the behavior to this function. I am doing further reading on the previous discussions and PR(s) now. |
@orsenthil any udpates? I'd like to continue with this PR as soon as is convenient :) |
Hi @lincolnauster , I was -1 and was thinking much on introducing a flag with the enum in the parse module.
This API signature is going to confuse people and will be huge blocker for further adoption and change (even if the default arguments are specified). I was thinking how best to mitigate that.
I am going to paste this comment on the PR too, and we could continue the discussion there. If design design required, we can bring it to a wider forum too. |
This is closed with wont fix (at least for the implementation suggested in this PR - #30520) - A simpler approach if suggested by a new issue can be considered. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: