Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlparse on tel: URI-s misses the scheme in some cases #58280

Closed
ivanherman mannequin opened this issue Feb 21, 2012 · 10 comments
Closed

urlparse on tel: URI-s misses the scheme in some cases #58280

ivanherman mannequin opened this issue Feb 21, 2012 · 10 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ivanherman
Copy link
Mannequin

ivanherman mannequin commented Feb 21, 2012

BPO 14072
Nosy @orsenthil, @pitrou, @ezio-melotti, @merwok, @bitdancer
Files
  • issue14072.diff: Patch against 3.2.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2012-05-19.14:18:08.389>
    created_at = <Date 2012-02-21.09:43:52.471>
    labels = ['type-bug', 'library']
    title = 'urlparse on tel: URI-s misses the scheme in some cases'
    updated_at = <Date 2013-01-07.16:58:07.596>
    user = 'https://bugs.python.org/ivanherman'

    bugs.python.org fields:

    activity = <Date 2013-01-07.16:58:07.596>
    actor = 'pitrou'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2012-05-19.14:18:08.389>
    closer = 'ezio.melotti'
    components = ['Library (Lib)']
    creation = <Date 2012-02-21.09:43:52.471>
    creator = 'ivan_herman'
    dependencies = []
    files = ['25486']
    hgrepos = []
    issue_num = 14072
    keywords = ['patch']
    message_count = 10.0
    messages = ['153859', '154181', '160113', '160159', '160160', '160735', '160737', '161117', '161119', '179271']
    nosy_count = 7.0
    nosy_names = ['orsenthil', 'pitrou', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'python-dev', 'ivan_herman']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue14072'
    versions = ['Python 2.7', 'Python 3.2', 'Python 3.3']

    @ivanherman ivanherman mannequin added the type-bug An unexpected behavior, bug, or error label Feb 21, 2012
    @ivanherman
    Copy link
    Mannequin Author

    ivanherman mannequin commented Feb 21, 2012

    I think that the screen dump below is fairly clear:

    10:41 Ivan> python
    Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) 
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import urlparse
    >>> x = "tel:+31-641044153"
    >>> urlparse.urlparse(x)
    ParseResult(scheme='tel', netloc='', path='+31-641044153', params='', query='', fragment='')
    >>> y = "tel:+31641044153"
    >>> urlparse.urlparse(y)
    ParseResult(scheme='', netloc='', path='tel:+31641044153', params='', query='', fragment='')
    >>> 

    It seems that, when the phone number does not have any separator character, the parsing goes wrong (separators are not required per RFC 3966)

    @orsenthil orsenthil self-assigned this Feb 21, 2012
    @merwok
    Copy link
    Member

    merwok commented Feb 25, 2012

    urlparse doesn’t actually implement generic parsing rules according to the most recent RFCs; it has hard-coded registries of supported schemes. tel is not currently supported. That said, it’s strange that the parsing differs in your two examples.

    @merwok merwok added the stdlib Python modules in the Lib dir label Feb 25, 2012
    @ezio-melotti
    Copy link
    Member

    Here's a possible patch.

    The problem is that urlsplit (in Lib/urllib/parse.py:348) tries to convert the part after the : (in this case +31-641044153 and +31641044153) to int to see if it's a port number. This doesn't work with +31-641044153, but it does with +31-641044153.
    In the patch I'm assuming that the port number can only contain ascii digits (no leading '+/-', no spaces, no non-ascii digits) and checking for it explicitly, rather than using int() in a try/except.

    @ezio-melotti
    Copy link
    Member

    In the patch I'm assuming that the port number can only contain ascii digits

    RFC 3986 0 defines the port as
    port = *DIGIT
    and part of the "authority" 1 as
    authority = [ userinfo "@" ] host [ ":" port ]
    userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
    host = IP-literal / IPv4address / reg-name
    port = *DIGIT
    so my assumption should be correct.

    @bitdancer
    Copy link
    Member

    See also bpo-14036.

    @orsenthil
    Copy link
    Member

    Hi Ezio,

    The patch is fine and the check is correct. I was thinking if by removing int() based verification are we missing out anything on port number check. But looks like we wont as the int() previously is done to find the proper scheme and url part for the applicable cases.

    In addition to changes in the patch, I think, it would helpful to add 'tel' to uses_netloc in the classification at the top of the module.

    Thanks!

    @orsenthil orsenthil assigned ezio-melotti and unassigned orsenthil May 15, 2012
    @merwok
    Copy link
    Member

    merwok commented May 15, 2012

    it would helpful to add 'tel' to uses_netloc

    How so? The tel scheme does not use a netloc.

    @ezio-melotti
    Copy link
    Member

    According to RFC 1808 0, the netloc must follow "//", so this doesn't seem to apply to 'tel' URIs.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 19, 2012

    New changeset ff0fd7b26219 by Ezio Melotti in branch '2.7':
    bpo-14072: Fix parsing of tel URIs in urlparse by making the check for ports stricter.
    http://hg.python.org/cpython/rev/ff0fd7b26219

    New changeset 9f6b7576c08c by Ezio Melotti in branch '3.2':
    bpo-14072: Fix parsing of tel URIs in urlparse by making the check for ports stricter.
    http://hg.python.org/cpython/rev/9f6b7576c08c

    New changeset b78c67665a7f by Ezio Melotti in branch 'default':
    bpo-14072: merge with 3.2.
    http://hg.python.org/cpython/rev/b78c67665a7f

    @pitrou
    Copy link
    Member

    pitrou commented Jan 7, 2013

    For the record, urlparse still doesn't handle bare "tel" URIs such as "tel:1234":

    >>> parse.urlparse("tel:1234")
    ParseResult(scheme='', netloc='', path='tel:1234', params='', query='', fragment='')

    This is not terribly important since these URLs are not RFC 3966-compliant (a tel URI must have either a global number starting with "+" - e.g. "tel:+1234" - or a local number with a phone-context parameter - e.g. "tel:1234;phone-context=python.org"). Yet, there actual telecom systems producing such non-compliant URIs, so they might be nice to support too.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants