Unsupported provider

classification
Title: urlparse on tel: URI-s misses the scheme in some cases
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.3, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: eric.araujo, ezio.melotti, ivan_herman, orsenthil, pitrou, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2012-02-21 09:43 by ivan_herman, last changed 2013-01-07 16:58 by pitrou. This issue is now closed.

Files
File name Uploaded Description Edit
issue14072.diff ezio.melotti, 2012-05-06 22:27 Patch against 3.2. review
Messages (10)
msg153859 - (view) Author: Ivan Herman (ivan_herman) Date: 2012-02-21 09:45
I think that the screen dump below is fairly clear:

10:41 Ivan> python
Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urlparse
>>> x = "tel:+31-641044153"
>>> urlparse.urlparse(x)
ParseResult(scheme='tel', netloc='', path='+31-641044153', params='', query='', fragment='')
>>> y = "tel:+31641044153"
>>> urlparse.urlparse(y)
ParseResult(scheme='', netloc='', path='tel:+31641044153', params='', query='', fragment='')
>>> 

It seems that, when the phone number does not have any separator character, the parsing goes wrong (separators are not required per RFC 3966)
msg154181 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-02-25 05:32
urlparse doesn’t actually implement generic parsing rules according to the most recent RFCs; it has hard-coded registries of supported schemes.  tel is not currently supported.  That said, it’s strange that the parsing differs in your two examples.
msg160113 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-05-06 22:27
Here's a possible patch.

The problem is that urlsplit (in Lib/urllib/parse.py:348) tries to convert the part after the : (in this case +31-641044153 and +31641044153) to int to see if it's a port number.  This doesn't work with +31-641044153, but it does with +31-641044153.
In the patch I'm assuming that the port number can only contain ascii digits (no leading '+/-', no spaces, no non-ascii digits) and checking for it explicitly, rather than using int() in a try/except.
msg160159 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-05-07 16:49
> In the patch I'm assuming that the port number can only contain ascii digits

RFC 3986 [0] defines the port as
   port          = *DIGIT
and part of the "authority" [1] as
   authority     = [ userinfo "@" ] host [ ":" port ]
   userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )
   host          = IP-literal / IPv4address / reg-name
   port          = *DIGIT
so my assumption should be correct.

[0]: http://tools.ietf.org/html/rfc3986#section-3.2.3
[1]: http://tools.ietf.org/html/rfc3986#appendix-A
msg160160 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-05-07 17:00
See also issue 14036.
msg160735 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2012-05-15 15:10
Hi Ezio,

The patch is fine and the check is correct. I was thinking if by removing int() based verification are we missing out anything on port number check. But looks like we wont as the int() previously is done to find the proper scheme and url part for the applicable cases.

In addition to changes in the patch, I think, it would helpful to add 'tel' to uses_netloc in the classification at the top of the module.

Thanks!
msg160737 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-05-15 15:39
> it would helpful to add 'tel' to uses_netloc

How so?  The tel scheme does not use a netloc.
msg161117 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-05-19 14:00
According to RFC 1808 [0], the netloc must follow "//", so this doesn't seem to apply to 'tel' URIs.

[0]: http://tools.ietf.org/html/rfc1808.html#section-2.1
msg161119 - (view) Author: Roundup Robot (python-dev) Date: 2012-05-19 14:16
New changeset ff0fd7b26219 by Ezio Melotti in branch '2.7':
#14072: Fix parsing of tel URIs in urlparse by making the check for ports stricter.
http://hg.python.org/cpython/rev/ff0fd7b26219

New changeset 9f6b7576c08c by Ezio Melotti in branch '3.2':
#14072: Fix parsing of tel URIs in urlparse by making the check for ports stricter.
http://hg.python.org/cpython/rev/9f6b7576c08c

New changeset b78c67665a7f by Ezio Melotti in branch 'default':
#14072: merge with 3.2.
http://hg.python.org/cpython/rev/b78c67665a7f
msg179271 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-01-07 16:58
For the record, urlparse still doesn't handle bare "tel" URIs such as "tel:1234":

>>> parse.urlparse("tel:1234")
ParseResult(scheme='', netloc='', path='tel:1234', params='', query='', fragment='')

This is not terribly important since these URLs are not RFC 3966-compliant (a tel URI must have either a global number starting with "+" - e.g. "tel:+1234" - or a local number with a phone-context parameter - e.g. "tel:1234;phone-context=python.org"). Yet, there actual telecom systems producing such non-compliant URIs, so they might be nice to support too.
History
Date User Action Args
2013-01-07 16:58:07pitrousetnosy: + pitrou
messages: + msg179271
2012-05-19 14:18:08ezio.melottisetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2012-05-19 14:16:32python-devsetnosy: + python-dev
messages: + msg161119
2012-05-19 14:00:46ezio.melottisetmessages: + msg161117
2012-05-15 15:39:56eric.araujosetmessages: + msg160737
2012-05-15 15:10:22orsenthilsetassignee: orsenthil -> ezio.melotti
messages: + msg160735
2012-05-07 17:00:56r.david.murraysetnosy: + r.david.murray
messages: + msg160160
2012-05-07 16:49:45ezio.melottisetstage: patch review -> commit review
messages: + msg160159
versions: + Python 3.2, Python 3.3
2012-05-06 22:27:41ezio.melottisetfiles: + issue14072.diff

nosy: + ezio.melotti
messages: + msg160113

keywords: + patch
stage: patch review
2012-02-25 05:32:41eric.araujosetnosy: + eric.araujo
messages: + msg154181
components: + Library (Lib), - None
2012-02-21 09:49:48orsenthilsetassignee: orsenthil

nosy: + orsenthil
2012-02-21 09:45:24ivan_hermansetmessages: + msg153859
2012-02-21 09:43:52ivan_hermancreate