Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlparse.urlsplit() regression for paths consisting of digits #55676

Closed
calvin mannequin opened this issue Mar 11, 2011 · 12 comments
Closed

urlparse.urlsplit() regression for paths consisting of digits #55676

calvin mannequin opened this issue Mar 11, 2011 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@calvin
Copy link
Mannequin

calvin mannequin commented Mar 11, 2011

BPO 11467
Nosy @orsenthil, @bitdancer
Files
  • urlparse.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/orsenthil'
    closed_at = <Date 2011-04-15.10:23:51.648>
    created_at = <Date 2011-03-11.13:47:41.555>
    labels = ['type-bug', 'library']
    title = 'urlparse.urlsplit() regression for paths consisting of digits'
    updated_at = <Date 2011-04-15.10:23:51.647>
    user = 'https://bugs.python.org/calvin'

    bugs.python.org fields:

    activity = <Date 2011-04-15.10:23:51.647>
    actor = 'orsenthil'
    assignee = 'orsenthil'
    closed = True
    closed_date = <Date 2011-04-15.10:23:51.648>
    closer = 'orsenthil'
    components = ['Library (Lib)']
    creation = <Date 2011-03-11.13:47:41.555>
    creator = 'calvin'
    dependencies = []
    files = ['21111']
    hgrepos = []
    issue_num = 11467
    keywords = ['patch']
    message_count = 12.0
    messages = ['130570', '130575', '130594', '130599', '130663', '130795', '130796', '130797', '130799', '133801', '133802', '133803']
    nosy_count = 5.0
    nosy_names = ['calvin', 'orsenthil', 'r.david.murray', 'santoso.wijaya', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue11467'
    versions = ['Python 3.1', 'Python 2.7', 'Python 3.2', 'Python 3.3']

    @calvin
    Copy link
    Mannequin Author

    calvin mannequin commented Mar 11, 2011

    When using a javascript URL with only digits as paths, the urlsplit() functions behaves different in Python 2.7 than in 2.6:

    $ python2.6 -c "import urlparse; print urlparse.urlsplit('javascript:123')"
    SplitResult(scheme='javascript', netloc='', path='123', query='', fragment='')
    
    $ python2.7 -c "import urlparse; print urlparse.urlsplit('javascript:123')"
    SplitResult(scheme='', netloc='', path='javascript:123', query='', fragment='')

    Python 3.2 has the same regression:
    $ python3.2 -c "import urllib.parse; print(urllib.parse.urlsplit('javascript:123'))"
    SplitResult(scheme='', netloc='', path='javascript:123', query='', fragment='')

    I consider the Python 2.6 behaviour to be correct, ie. the current behaviour is buggy.

    @calvin calvin mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 11, 2011
    @calvin
    Copy link
    Mannequin Author

    calvin mannequin commented Mar 11, 2011

    The behaviour change is caused by the fix for issue bpo-754016.

    @orsenthil
    Copy link
    Member

    What kind of url is 'javascript:123' and how do you (/ we) say that python2.6 behavior was correct?

    @orsenthil orsenthil self-assigned this Mar 11, 2011
    @calvin
    Copy link
    Mannequin Author

    calvin mannequin commented Mar 11, 2011

    Regarding the correctness of the Python 2.6 implementation: http://www.faqs.org/rfcs/rfc1738.html specifies URLs of the form <scheme>:<scheme-specific-part> where the scheme specific part is allowed to consist only of digits.

    I agree that the example URL is not a good one and it is artificially constructed.

    Some better examples demonstrating the same issue might be
    clsid:85bbd92o-42a0-1o69-a2e4-08002b30309d
    or
    mailto:1337@example.org

    @calvin
    Copy link
    Mannequin Author

    calvin mannequin commented Mar 12, 2011

    To make the previous comment more precise: URLs where
    the scheme specific part begins with a digit are affected.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Mar 14, 2011

    I'm attaching a patch with a fix and a unittest using the email example. I put this in a new test_RFC2368 (the mailto URL scheme) method. Seems like there is no unittest for parsing mailto scheme to begin with.

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Mar 14, 2011

    Oops, wrong revision base.

    @orsenthil
    Copy link
    Member

    Santoso, Quick review comments:

    1. The patch looks good.
    2. I would use a temporary 'throw-away' variable instead of _, but don't bother to change it, before committing I shall take care.
    3. Important - Did you find any regression with the earlier builds, also could you run the full test suite to ensure that all tests pass?

    @santosowijaya
    Copy link
    Mannequin

    santosowijaya mannequin commented Mar 14, 2011

    Senthil,

    Thanks for the review! I was initially thinking of port = ... but opted for _, arbitrarily, instead.

    regrtest on Darwin-10.6.0-i386-64bit ran fine.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2011

    New changeset 7a693e283c68 by Senthil Kumaran in branch '2.7':
    Issue bpo-11467: Fix urlparse behavior when handling urls which contains scheme
    http://hg.python.org/cpython/rev/7a693e283c68

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 15, 2011

    New changeset 495d12196487 by Senthil Kumaran in branch '3.1':
    Issue bpo-11467: Fix urlparse behavior when handling urls which contains scheme specific part only digits.
    http://hg.python.org/cpython/rev/495d12196487

    @orsenthil
    Copy link
    Member

    Fixed this in all codelines. Thanks Santoso.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant