Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2019-9636: urlsplit does not handle NFKC normalization #80397

Closed
zooba opened this issue Mar 6, 2019 · 22 comments
Closed

CVE-2019-9636: urlsplit does not handle NFKC normalization #80397

zooba opened this issue Mar 6, 2019 · 22 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes topic-unicode type-security A security issue

Comments

@zooba
Copy link
Member

zooba commented Mar 6, 2019

BPO 36216
Nosy @vstinner, @benjaminp, @jkloth, @ned-deily, @mcepl, @ezio-melotti, @vadmium, @koobs, @zooba, @tirkarthi
PRs
  • bpo-36216: Add check for characters in netloc that normalize to separators #12201
  • [3.7] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) #12213
  • [3.6] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) #12215
  • [2.7] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) #12216
  • [3.5] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) #12223
  • [3.4] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) #12224
  • [2.7] bpo-36216: Only print test messages when verbose (GH-12291) #12291
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/zooba'
    closed_at = <Date 2019-03-12.21:25:47.134>
    created_at = <Date 2019-03-06.17:37:20.433>
    labels = ['type-security', '3.7', '3.8', 'expert-unicode']
    title = 'CVE-2019-9636: urlsplit does not handle NFKC normalization'
    updated_at = <Date 2021-05-11.14:28:00.197>
    user = 'https://github.com/zooba'

    bugs.python.org fields:

    activity = <Date 2021-05-11.14:28:00.197>
    actor = 'larry'
    assignee = 'steve.dower'
    closed = True
    closed_date = <Date 2019-03-12.21:25:47.134>
    closer = 'steve.dower'
    components = ['Unicode']
    creation = <Date 2019-03-06.17:37:20.433>
    creator = 'steve.dower'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36216
    keywords = ['patch', 'security_issue']
    message_count = 22.0
    messages = ['337336', '337383', '337398', '337411', '337412', '337532', '337566', '337645', '337646', '337704', '337711', '337720', '337725', '337753', '337771', '337773', '337806', '337811', '339391', '339428', '339433', '393450']
    nosy_count = 11.0
    nosy_names = ['vstinner', 'benjamin.peterson', 'jkloth', 'ned.deily', 'mcepl', 'ezio.melotti', 'jeremy.kloth', 'martin.panter', 'koobs', 'steve.dower', 'xtreak']
    pr_nums = ['12201', '12213', '12215', '12216', '12223', '12224', '12291']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue36216'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @zooba
    Copy link
    Member Author

    zooba commented Mar 6, 2019

    URLs encoded with Punycode/IDNA use NFKC normalization to decompose characters 1. This can result in some characters introducing new segments into a URL.

    For example, \uFF03 is not equal to '#' under direct comparison, but normalizes to '#' which changes the fragment part of the URL. Similarly \u2100 normalizes to 'a/c' which introduces a path segment.

    Currently, urlsplit() does not normalize, which may result in it returning a different netloc from what a browser would

    >>> u = "https://example.com\uFF03@bing.com"
    >>> urlsplit(u).netloc.rpartition("@")[2]
    bing.com
    
    >>> # Simulate
    >>> u = "https://example.com\uFF03@bing.com".encode("idna").decode("ascii")
    >>> urlsplit(u).netloc.rpartition("@")[2]
    example.com

    (Note that .netloc includes user/pass and .rpartition("@") is often used to remove it.)

    This may be used to steal cookies or authentication data from applications that use the netloc to cache or retrieve this information.

    The preferred fix for the urllib module is to detect and raise ValueError if NFKC-normalization of the netloc introduce any of '/?#@:'. Applications that want to avoid this error should perform their own decomposition using unicodedata or transcode to ASCII via IDNA.

    >>> # New behavior
    >>> u = "https://example.com\uFF03@bing.com"
    >>> urlsplit(u)
    ...
    ValueError: netloc 'example.com#@bing.com' contains invalid characters under NFKC normalization
    
    >>> # Workaround 1
    >>> u2 = unicodedata.normalize("NFKC", u)
    >>> urlsplit(u2)
    SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com')
    
    >>> # Workaround 2
    >>> u3 = u.encode("idna").decode("ascii")
    >>> urlsplit(u3)
    SplitResult(scheme='https', netloc='example.com', path='', query='', fragment='@bing.com')

    Note that we do not address other characters, such as those that convert into period. The error is only raised for changes that affect how urlsplit() locates the netloc and the very common next step of removing credentials from the netloc.

    This vulnerability was reported by Jonathan Birch of Microsoft Corporation and Panayiotis Panayiotou (p.panayiotou2@gmail.com) via the Python Security Response Team. A CVE number has been requested.

    @zooba zooba added 3.7 (EOL) end of life 3.8 only security fixes topic-unicode labels Mar 6, 2019
    @zooba zooba self-assigned this Mar 6, 2019
    @zooba zooba added the type-security A security issue label Mar 6, 2019
    @vstinner
    Copy link
    Member

    vstinner commented Mar 7, 2019

    FYI I added this vulnerability to:
    https://python-security.readthedocs.io/vuln/urlsplit-nfkc-normalization.html

    @zooba
    Copy link
    Member Author

    zooba commented Mar 7, 2019

    New changeset 16e6f7d by Steve Dower in branch 'master':
    bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201)
    16e6f7d

    @zooba
    Copy link
    Member Author

    zooba commented Mar 7, 2019

    New changeset daad2c4 by Steve Dower in branch '3.7':
    bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201)
    daad2c4

    @zooba
    Copy link
    Member Author

    zooba commented Mar 7, 2019

    New changeset e37ef41 by Steve Dower in branch '2.7':
    bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201)
    e37ef41

    @zooba
    Copy link
    Member Author

    zooba commented Mar 8, 2019

    @jkloth
    Copy link
    Contributor

    jkloth commented Mar 9, 2019

    A missed print statement in the 2.7 patch is causing the tests to fail.

    Line 647 of Lib/test/test_urlparse.py

    @larryhastings
    Copy link
    Contributor

    New changeset 62d3654 by larryhastings (Steve Dower) in branch '3.4':
    bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) (bpo-12224)
    62d3654

    @larryhastings
    Copy link
    Contributor

    New changeset c0d9511 by larryhastings (Steve Dower) in branch '3.5':
    bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) (bpo-12223)
    c0d9511

    @zooba
    Copy link
    Member Author

    zooba commented Mar 11, 2019

    A missed print statement in the 2.7 patch is causing the tests to fail.

    How does that cause tests to fail? Is it going to stderr? Or just causing an error.

    @jeremykloth
    Copy link
    Mannequin

    jeremykloth mannequin commented Mar 12, 2019

    How does that cause tests to fail? Is it going to stderr? Or just causing
    an error.

    It is causing an "unexpected output error". When the test is re-run at the
    end, it is run in verbose mode so the extra output is ignored and thus
    passes at that point.

    @ned-deily
    Copy link
    Member

    New changeset 23fc041 by Ned Deily (Steve Dower) in branch '3.6':
    [3.6] bpo-36216: Add check for characters in netloc that normalize to separators (GH-12201) (GH-12215)
    23fc041

    @tirkarthi
    Copy link
    Member

    Sample buildbot log of print statement in testcase causing rerun of test : https://buildbot.python.org/all/#/builders/101/builds/364/steps/4/logs/stdio

    @zooba
    Copy link
    Member Author

    zooba commented Mar 12, 2019

    Thanks. Just posted a PR that puts the print behind a verbose flag (on Python 3 it uses subtest so that we see which character caused the failure).

    If that doesn't work for whatever reason, I'll just leave it out and hope that we never have to debug it on Python 2 :)

    @vstinner
    Copy link
    Member

    Commit e37ef41 introduced a regression on AMD64 Windows8.1 Refleaks 2.7:

    test_urlparse leaked [114, 114, 114] references, sum=342

    https://buildbot.python.org/all/#/builders/33/builds/532

    It is fixed by PR 12291.

    You can test:

    python -m test -R 3:3 test_urlparse

    @zooba
    Copy link
    Member Author

    zooba commented Mar 12, 2019

    If it's fixed by not printing to the console, then it must be a refleak in print. Or perhaps in the test failure/repeat.

    That PR is going to be merged as soon as AppVeyor finishes, so nothing to worry about here.

    @zooba
    Copy link
    Member Author

    zooba commented Mar 12, 2019

    New changeset 507bd8c by Steve Dower in branch '2.7':
    [3.7] bpo-36216: Only print test messages when verbose (GH-12291)
    507bd8c

    @zooba
    Copy link
    Member Author

    zooba commented Mar 12, 2019

    Hah, I typo'd the commit message and marked it as 3.7 instead of 2.7. Oh well.

    @zooba zooba closed this as completed Mar 12, 2019
    @vstinner vstinner changed the title urlsplit does not handle NFKC normalization CVE-2019-9636: urlsplit does not handle NFKC normalization Mar 14, 2019
    @mcepl
    Copy link
    Mannequin

    mcepl mannequin commented Apr 3, 2019

    I am trying to investigate the impact of this bug on Python 2.6 (yes, it is for SLE), and I have hard to replicate the steps in the description even on 2.7:

    ~$ ipython2
    Python 2.7.15 (default, May 21 2018, 17:53:03) [GCC]
    Type "copyright", "credits" or "license" for more information.

    IPython 5.8.0 -- An enhanced Interactive Python.
    ? -> Introduction and overview of IPython's features.
    %quickref -> Quick reference.
    help -> Python's own help system.
    object? -> Details about 'object', use 'object??' for extra details.

    In [1]: from urlparse import urlsplit

    In [2]: u = "https://example.com\\uFF03@bing.com".encode("idna").decode("ascii")

    In [3]: u
    Out[3]: u'https://example.com\\\\uFF03@bing.com'

    In [4]: urlsplit(u).netloc.rpartition('@')[2]
    Out[4]: u'bing.com'

    In [5]: u = "https://example.com\\uFF03@bing.com"

    In [6]: urlsplit(u).netloc.rpartition('@')[2]
    Out[6]: 'bing.com'

    In [7]: u = u.encode("idna").decode("ascii")

    In [8]: urlsplit(u).netloc.rpartition('@')[2]
    Out[8]: u'bing.com'

    In [9]: import unicodedata

    In [10]: u2 = unicodedata.normalize('NFKC', u)

    In [11]: u2
    Out[11]: u'https://example.com\\\\uFF03@bing.com'

    In [12]: urlsplit(u2)
    Out[12]: SplitResult(scheme=u'https', netloc=u'example.com\\uFF03@bing.com', path=u'', query='', fragment='')

    In [13]:

    Yes, the results are weird, and most likely they would break any software relying on them, but I am not sure that it is a security issue.

    vstinner ? steve.dower ? What do you think?

    @zooba
    Copy link
    Member Author

    zooba commented Apr 4, 2019

    You need a "u" prefix on some of your strings or they're probably being immediately decomposed. The result of urlsplit should be unicode on Python 2 for a Unicode input, and yours are not.

    @mcepl
    Copy link
    Mannequin

    mcepl mannequin commented Apr 4, 2019

    You are right. Thank you.

    @dcockcn dcockcn mannequin removed 3.7 (EOL) end of life 3.8 only security fixes labels May 11, 2021
    @dcockcn dcockcn mannequin removed the topic-unicode label May 11, 2021
    @koobs
    Copy link

    koobs commented May 11, 2021

    Fix meta (not incl 2.7 which is no longer available to select).

    @koobs koobs added topic-unicode 3.7 (EOL) end of life 3.8 only security fixes labels May 11, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes topic-unicode type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants