New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urlparse on tel: URI-s misses the scheme in some cases #58280
Comments
I think that the screen dump below is fairly clear: 10:41 Ivan> python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> x = "tel:+31-641044153"
>>> urlparse.urlparse(x)
ParseResult(scheme='tel', netloc='', path='+31-641044153', params='', query='', fragment='')
>>> y = "tel:+31641044153"
>>> urlparse.urlparse(y)
ParseResult(scheme='', netloc='', path='tel:+31641044153', params='', query='', fragment='')
>>> It seems that, when the phone number does not have any separator character, the parsing goes wrong (separators are not required per RFC 3966) |
urlparse doesn’t actually implement generic parsing rules according to the most recent RFCs; it has hard-coded registries of supported schemes. tel is not currently supported. That said, it’s strange that the parsing differs in your two examples. |
Here's a possible patch. The problem is that urlsplit (in Lib/urllib/parse.py:348) tries to convert the part after the : (in this case +31-641044153 and +31641044153) to int to see if it's a port number. This doesn't work with +31-641044153, but it does with +31-641044153. |
RFC 3986 0 defines the port as |
See also bpo-14036. |
Hi Ezio, The patch is fine and the check is correct. I was thinking if by removing int() based verification are we missing out anything on port number check. But looks like we wont as the int() previously is done to find the proper scheme and url part for the applicable cases. In addition to changes in the patch, I think, it would helpful to add 'tel' to uses_netloc in the classification at the top of the module. Thanks! |
How so? The tel scheme does not use a netloc. |
According to RFC 1808 0, the netloc must follow "//", so this doesn't seem to apply to 'tel' URIs. |
New changeset ff0fd7b26219 by Ezio Melotti in branch '2.7': New changeset 9f6b7576c08c by Ezio Melotti in branch '3.2': New changeset b78c67665a7f by Ezio Melotti in branch 'default': |
For the record, urlparse still doesn't handle bare "tel" URIs such as "tel:1234": >>> parse.urlparse("tel:1234")
ParseResult(scheme='', netloc='', path='tel:1234', params='', query='', fragment='') This is not terribly important since these URLs are not RFC 3966-compliant (a tel URI must have either a global number starting with "+" - e.g. "tel:+1234" - or a local number with a phone-context parameter - e.g. "tel:1234;phone-context=python.org"). Yet, there actual telecom systems producing such non-compliant URIs, so they might be nice to support too. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: