Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reject newline character (U+000A) in URLs in urllib.parse #74898

Closed
vstinner opened this issue Jun 20, 2017 · 10 comments
Closed

Reject newline character (U+000A) in URLs in urllib.parse #74898

vstinner opened this issue Jun 20, 2017 · 10 comments
Labels
3.7 (EOL) end of life type-security A security issue

Comments

@vstinner
Copy link
Member

BPO 30713
Nosy @gpshead, @vstinner, @vadmium, @postmasters, @serhiy-storchaka
PRs
  • bpo-30713: Reject newline in urllib.parse #2301
  • [Security] bpo-30713: Reject newline in urllib.parse #2303
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-07-26.02:43:37.898>
    created_at = <Date 2017-06-20.15:20:33.115>
    labels = ['type-security', '3.7']
    title = 'Reject newline character (U+000A) in URLs in urllib.parse'
    updated_at = <Date 2021-04-23.19:36:58.331>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-04-23.19:36:58.331>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-07-26.02:43:37.898>
    closer = 'vstinner'
    components = []
    creation = <Date 2017-06-20.15:20:33.115>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30713
    keywords = []
    message_count = 10.0
    messages = ['296453', '296454', '296457', '296459', '296467', '296472', '297154', '297512', '298856', '299186']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'vstinner', 'martin.panter', 'Nam.Nguyen', 'serhiy.storchaka', 'ecbftw']
    pr_nums = ['2301', '2303']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue30713'
    versions = ['Python 3.7']

    @vstinner
    Copy link
    Member Author

    Spin-off of the bpo-30500: modify the urllib.parse module to reject the newline character (U+000A) in URLS. Modify 3 functions:

    • splittype()
    • splithost()
    • splitport()

    @vstinner vstinner added 3.7 (EOL) end of life type-security A security issue labels Jun 20, 2017
    @vstinner
    Copy link
    Member Author

    I chose to not change how the \r newline character (U+000D) is handled: it is still accepted, even if it is used in SMTP and HTTP as newline separator.

    @serhiy-storchaka
    Copy link
    Member

    See also bpo-29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.

    @vstinner
    Copy link
    Member Author

    See also bpo-29606. I think that fixing the modules implementing Internet protocols is more appropriate way than fixing just a parsing utility.

    IMHO we can/should fix ftplib (ftplib, httplib, etc.) *and* urllib.

    @vstinner
    Copy link
    Member Author

    I tried to be more strict, and I was bitten by tests: test_urllib fails on splittype("data:...") where (...) contains newlines characters. One example:

    ======================================================================

    ERROR: test_read_text_base64 (test.test_urllib.urlopen_DataTests)

    ----------------------------------------------------------------------

    Traceback (most recent call last):

    File "/home/travis/build/python/cpython/Lib/test/test_urllib.py", line 511, in setUp

        self.image_url_resp = urllib.request.urlopen(self.image_url)

    File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 222, in urlopen

        return opener.open(url, data, timeout)

    File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 510, in open

        req = Request(fullurl, data)

    File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 328, in __init__

        self.full_url = url

    File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 354, in full_url

        self._parse()

    File "/home/travis/build/python/cpython/Lib/urllib/request.py", line 383, in _parse

    raise ValueError("unknown url type: %r" % self.full_url)
    

    ValueError: unknown url type: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAIAAAABCAIAAAB7\nQOjdAAAAAXNSR0IArs4c6QAAAA9JREFUCNdj%0AYGBg%2BP//PwAGAQL%2BCm8 vHgAAAABJRU5ErkJggg%3D%3D%0A%20'

    I modified splittype() to reject newlines before the type, but accept them after the type.

    @postmasters
    Copy link
    Mannequin

    postmasters mannequin commented Jun 20, 2017

    Just being nosy here that we should not continue down the path with regex. A proper grammar parser according to spec would be much more appropriate and eliminate these whack-a-mole issues.

    @serhiy-storchaka
    Copy link
    Member

    First, I think urllib.parse is not the best place for doing such checks. Even if add some checks in urllib.parse, they should be added also at lower level in urllib.request or concrete protocol implementations.

    Second, PR 2303 actually doesn't reject arguments with '\n'. splithost('example.org\n') will return a tuple ('example.org\n', None), etc.

    @vadmium
    Copy link
    Member

    vadmium commented Jul 2, 2017

    It might help if you explained why you want to make these changes. Otherwise I have to guess. Is a compromise between strictly rejecting all non-URL characters (not just control characters), versus leaving it up to user applications to validate their URLs?

    I guess it could partially prevent some newline injection problems like bpo-29606 (FTP) and bpo-30458 (HTTP). But how do we know it closes more security holes than it opens?

    I don’t understand the focus on these three functions. They are undocumented and more-or-less deprecated (bpo-27485). Why not focus on the “urlsplit” and “urlparse” functions first?

    Some of the changes seem to go too far, e.g. in the splithost("//hostname/u\nrl") test case, the hostname is fine, but it is not recognized. This would partially conflict the patch in bpo-13359, with proposes to percent-encode newlines after passing through “splithost”. And it would make the URL look like a relative URL, which is a potential security hole and reminds me of the open redirect bug report (bpo-23505).

    @ecbftw
    Copy link
    Mannequin

    ecbftw mannequin commented Jul 22, 2017

    I'm the guy that did the original security research on this issue. I've been a pentester for over 12 years, where I am regularly helping developers understand how to best correct their injection flaws. Please carefully consider what I'm trying to tell you. I've been trying to get this message through to Python for nearly 2 years now. =(

    Serhiy has the right idea. His email here shows a firm understanding of the issue and the correct remediation approach:
    https://mail.python.org/pipermail/python-dev/2017-July/148715.html

    It seems every developer initially assumes input validation is the way you deal with injection flaws. But this creates a false sense of a "trade off" between security and usability. Rejecting evil-looking things up front just creates more problems for you. Also, you can't be sure every code path leading from user input to the injection point is addressed. (What other ways can new lines reach the FTP library?)

    Instead, the problem needs to be dealt with in the lines of code just before the injection occurs. Ideally, any special character should be encoded/escaped, in which case you get the best of both worlds: security and full usability. However, when you're dealing with a syntax that was poorly designed and doesn't have a standard way to escape special characters, then you are forced to reject them. Still, it is better to do this rejection right before the commands are dropped onto the TCP stream. That way, if you later decide that there's an acceptable way to encode new lines for *specific* commands, then you can perform that encoding right there and relax the restrictions in specific cases.

    I like Serhiy's idea of optionally supporting escaping via the syntax defined in RFC 2640. This should be disabled by default, and a short-term security patch shouldn't try to tackle it. But in a new Python release, a switch could be added that causes new lines to be escaped for specific commands, such as the CWD and GET commands. It is likely that this escaping is only going to be useful in specific commands, based on what FTP servers actually support. Of course none of this is possible if you use a heavy-handed approach of blocking %0a in URLs up front.

    @vstinner
    Copy link
    Member Author

    https://bugs.python.org/issue29606 was fixed in ftplib. urllib is not the right place to reject invalid inputs.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants