Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlparse of urllib returns wrong hostname #80519

Open
sanebow mannequin opened this issue Mar 18, 2019 · 14 comments
Open

urlparse of urllib returns wrong hostname #80519

sanebow mannequin opened this issue Mar 18, 2019 · 14 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@sanebow
Copy link
Mannequin

sanebow mannequin commented Mar 18, 2019

BPO 36338
Nosy @ronaldoussoren, @orsenthil, @tiran, @vadmium, @matrixise, @tirkarthi, @jpic, @iritkatriel
PRs
  • bpo-36338: Reject hostname with [ at position > 0 #14896
  • bpo-36338: urllib.urlparse rejects invalid IPv6 addresses #16780
  • Files
  • test_bug_36338.py: Unittest for this issue.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-03-18.08:06:11.906>
    labels = ['type-security', '3.8', '3.9', '3.10', '3.11', '3.7', 'library']
    title = 'urlparse of urllib returns wrong hostname'
    updated_at = <Date 2021-12-03.00:01:27.917>
    user = 'https://bugs.python.org/sanebow'

    bugs.python.org fields:

    activity = <Date 2021-12-03.00:01:27.917>
    actor = 'vstinner'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-03-18.08:06:11.906>
    creator = 'sanebow'
    dependencies = []
    files = ['48215']
    hgrepos = []
    issue_num = 36338
    keywords = ['patch']
    message_count = 14.0
    messages = ['338171', '338172', '338173', '338599', '338960', '338972', '349153', '351511', '351589', '351590', '354634', '354746', '355322', '407558']
    nosy_count = 9.0
    nosy_names = ['ronaldoussoren', 'orsenthil', 'christian.heimes', 'martin.panter', 'matrixise', 'xtreak', 'sanebow', 'jpic', 'iritkatriel']
    pr_nums = ['14896', '16780']
    priority = 'high'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue36338'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11']

    @sanebow
    Copy link
    Mannequin Author

    sanebow mannequin commented Mar 18, 2019

    The urlparse function in Python urllib returns the wrong hostname when parsing URL crafted by the malicious user. This may be caused by incorrect handling of IPv6 addresses. The bug could lead to open redirect in web applications which rely on urlparse to extract and validate the domain of redirection URL.

    The test case is as follows:

    >> from urllib.parse import urlparse
    >> urlparse(urlparse('http://benign.com\\[attacker.com]').hostname
    >> 'attacker.com'

    The correct behavior should be raising an invalid URL exception.

    @sanebow sanebow mannequin added 3.7 (EOL) end of life stdlib Python modules in the Lib dir type-security A security issue labels Mar 18, 2019
    @matrixise
    Copy link
    Member

    I can confirm with 3.7.2 on fedora 29

    @matrixise
    Copy link
    Member

    Here is a unittest where you can test this issue and the result on Python 3.8.0a2 and 3.7.2

    >>> 3.8.0a2
    ./python /tmp/test_bug_36338.py
    /tmp/test_bug_36338.py:8: SyntaxWarning: invalid escape sequence \[
      url = 'http://demo.com\[attacker.com]'
    3.8.0a2+ (heads/master:23581c018f, Mar 18 2019, 09:17:05) 
    [GCC 8.3.1 20190223 (Red Hat 8.3.1-2)]
    test_bad_url (__main__.TestUrlparse) ... FAIL

    ======================================================================
    FAIL: test_bad_url (main.TestUrlparse)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/tmp/test_bug_36338.py", line 13, in test_bad_url
        self.assertEqual(hostname, expected_hostname)
    AssertionError: 'attacker.com' != 'demo.com'
    - attacker.com
    + demo.com

    Ran 1 test in 0.001s

    FAILED (failures=1)

    >>> 3.7.2
    python /tmp/test_bug_36338.py
    3.7.2 (default, Jan 16 2019, 19:49:22) 
    [GCC 8.2.1 20181215 (Red Hat 8.2.1-6)]
    test_bad_url (__main__.TestUrlparse) ... FAIL

    ======================================================================
    FAIL: test_bad_url (main.TestUrlparse)
    ----------------------------------------------------------------------

    Traceback (most recent call last):
      File "/tmp/test_bug_36338.py", line 13, in test_bad_url
        self.assertEqual(hostname, expected_hostname)
    AssertionError: 'attacker.com' != 'demo.com'
    - attacker.com
    + demo.com

    Ran 1 test in 0.000s

    FAILED (failures=1)

    @matrixise matrixise added the 3.8 only security fixes label Mar 18, 2019
    @tirkarthi
    Copy link
    Member

    See also bpo-20271 that discusses the other format http://[::1]spam where ::1 is returned as hostname. urlparse tries to parse the hostname as IPV6 address when there is [ and parses till ] at [0] thus "benign.com\[attacker.com]" is treated as a URL where attacker.com is assumed as the IPV6 hostname. I am not sure of the correct behavior. FWIW at least Java and golang return "benign.com[attacker.com]" and Ruby raises an exception that this is a bad URL.

    Java

    (.getHost (java.net.URL. "http://benign.com\\\\[attacker.com]"))
    "benign.com\\[attacker.com]"

    golang: https://play.golang.org/p/q8pTo9ySLby

    [0]

    _, have_open_br, bracketed = hostinfo.partition('[')

    @ronaldoussoren
    Copy link
    Contributor

    Given a quick scan of RFC 3986[1] I'd say that the behaviour of Ruby seems to be the most correct. That said, I'd also check what the major browsers do in this case (FWIW both FF and Safari use 'benign.com' as the hostname in this case).

    [1] https://tools.ietf.org/html/rfc3986#page-17

    @tirkarthi
    Copy link
    Member

    I found this page to be uesful : https://url.spec.whatwg.org/#host-parsing and following the steps it seems that this should raise an error since at the 7th step it denotes that asciiDomain shouldn't contain forbidden host code point including "[]" . As another data point using 'new URL("http://benign.com[attacker.com]")' in browser's Javascript console also raises exception that this is a bad URL. Even if attacker.com is assumed to be the correct host by Python it's not validated to be an IPV6 address where it should fail.

    Ruby seems to use a regex : https://github.com/ruby/ruby/blob/trunk/lib/uri/rfc3986_parser.rb#L6
    Java parseurl : http://hg.openjdk.java.net/jdk/jdk/file/c4c225b49c5f/src/java.base/share/classes/java/net/URLStreamHandler.java#l124
    golang : https://github.com/golang/go/blob/50bd1c4d4eb4fac8ddeb5f063c099daccfb71b26/src/net/url/url.go#L587

    @sanebow
    Copy link
    Mannequin Author

    sanebow mannequin commented Aug 7, 2019

    Python2 urlparse.urlparse and urllib2.urlparse.urlparse have a similar IPv6 hostname parsing bug.

    >> urlparse.urlparse('http://nevil.com[]').hostname
    >> 'evil.com['

    This is less practical to exploit since the parsed domain contains a '[' in the end.

    Do I need to create a separate issue for this Python2 bug?

    I think the way PR 14896 fix the python3 bug can also be applied to this.

    Also, do we need a CVE ID for the python3 bug? As it may lead to some security issues in some Python apps, e.g., open-redirect. I have found such a case in a private bug bounty program.

    @tiran
    Copy link
    Member

    tiran commented Sep 9, 2019

    The guidelines https://url.spec.whatwg.org/#host-parsing make a lot of sense to me. Python should refuse hostnames with "[" unless

    • the hostname starts with "["
    • the hostname ends with "]"
    • the string between [] is a valid IPv6 address (full or shortened, without or with correctly quoted scope id)

    Python should refuse any hostname with forbidden chars "\x00\n\r #%/:?@", too.

    @tiran tiran added the 3.9 only security fixes label Sep 9, 2019
    @vstinner
    Copy link
    Member

    To be clear, the \ in 'http://benign.com\\[attacker.com]' is not needed to reproduce the bug:

    vstinner@apu$ python3 
    Python 3.7.4 (default, Jul  9 2019, 16:32:37) 
    >>> from urllib.parse import urlparse; print(urlparse('http://demo.com[attacker.com]').hostname)
    attacker.com

    @vstinner
    Copy link
    Member

    Python 3.5 and newer are impacted, but Python 2.7 behaves differently:

    vstinner@apu$ python2
    Python 2.7.16 (default, Apr 30 2019, 15:54:43) 
    >>> from urlparse import urlparse
    >>> urlparse('http://demo.com[attacker.com]').hostname
    'emo.com[attacker.com'

    @vstinner
    Copy link
    Member

    I proposed PR 16780 which makes the urllib.parse module way more stricter:

    • the IPv6 address is validated by ipaddress.IPv6Address() parser
    • invalid characters are rejected in the IPv6 zone: "%", "[" and "]"
    • the port number is now validated when parsing the URL: must be an integer in the [0; 65535] range

    Sadly, validating using ipaddress.IPv6Address() cannot be easily ported to Python 2 which doesn't have this module.

    @vstinner
    Copy link
    Member

    I modified my PR 16780 to also fix bpo-33342: "urllib IPv6 parsing fails with special characters in passwords".

    @vstinner
    Copy link
    Member

    OMG parsing an URL is a can of worms... There are so many open issues related to URL parsing!

    Related:

    There are 124 open issues with "urllib" in their title and 12 open issues with "urlparse" in their title.

    @iritkatriel
    Copy link
    Member

    It produces a deprecation warning on 3.11, but still does the same.

    >>> urlparse('http://benign.com\[attacker.com]').hostname
    <stdin>:1: DeprecationWarning: invalid escape sequence '\['
    'attacker.com'

    @iritkatriel iritkatriel added 3.10 only security fixes 3.11 only security fixes labels Dec 2, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants