Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[security][CVE-2019-9740][CVE-2019-9947] HTTP Header Injection (follow-up of CVE-2016-5699) #74643

Closed
orangetw mannequin opened this issue May 24, 2017 · 55 comments
Closed
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes deferred-blocker stdlib Python modules in the Lib dir type-security A security issue

Comments

@orangetw
Copy link
Mannequin

orangetw mannequin commented May 24, 2017

BPO 30458
Nosy @gpshead, @jaraco, @vstinner, @larryhastings, @benjaminp, @ned-deily, @ambv, @vadmium, @serhiy-storchaka, @zhangyangyu, @stratakis, @orangetw, @miss-islington, @tirkarthi, @ware, @ret2libc
PRs
  • bpo-30458: Disallow control chars in http URLs. #12755
  • bpo-30458: Disable https related urllib tests on a build without ssl #13032
  • bpo-30458: Use InvalidURL instead of ValueError. #13044
  • [3.7] bpo-30458: Disallow control chars in http URLs. (GH-12755) #13154
  • [3.6] bpo-30458: Disallow control chars in http URLs. (GH-12755) #13155
  • [3.5] bpo-30458: Disallow control chars in http URLs. (GH-12755) #13207
  • [2.7] bpo-30458: Disallow control chars in http URLs. (GH-12755) (GH-13154) #13315
  • bpo-35906: Fix CRLF injection in urllib #12524
  • bpo-35906: Avoid headers injections in urllib #11768
  • bpo-30458: Disallow control chars in http URLs. (GH-12755) #13771
  • bpo-38216, bpo-36274: Allow subclasses to override validation and encoding behavior #16321
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-12-09.03:10:03.575>
    created_at = <Date 2017-05-24.15:01:31.731>
    labels = ['type-security', 'deferred-blocker', '3.8', '3.9', '3.7', 'library']
    title = '[security][CVE-2019-9740][CVE-2019-9947] HTTP Header Injection (follow-up of CVE-2016-5699)'
    updated_at = <Date 2019-12-09.03:10:03.572>
    user = 'https://github.com/orangetw'

    bugs.python.org fields:

    activity = <Date 2019-12-09.03:10:03.572>
    actor = 'gregory.p.smith'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-12-09.03:10:03.575>
    closer = 'gregory.p.smith'
    components = ['Library (Lib)']
    creation = <Date 2017-05-24.15:01:31.731>
    creator = 'orange'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 30458
    keywords = ['patch', 'security_issue']
    message_count = 55.0
    messages = ['294360', '295026', '295067', '306981', '337970', '339754', '339840', '339846', '339848', '339850', '339851', '339852', '339853', '339857', '339858', '339861', '339884', '339894', '340405', '340407', '340408', '341174', '341175', '341176', '341178', '341192', '341234', '341286', '341290', '341291', '341724', '341750', '341906', '341932', '342470', '343045', '343104', '344826', '347282', '347285', '347290', '347897', '350003', '350028', '352451', '352596', '352727', '352731', '352751', '352760', '355246', '355261', '355298', '357988', '358050']
    nosy_count = 16.0
    nosy_names = ['gregory.p.smith', 'jaraco', 'vstinner', 'larry', 'benjamin.peterson', 'ned.deily', 'lukasz.langa', 'martin.panter', 'serhiy.storchaka', 'xiang.zhang', 'cstratak', 'orange', 'miss-islington', 'xtreak', 'ware', 'rschiron']
    pr_nums = ['12755', '13032', '13044', '13154', '13155', '13207', '13315', '12524', '11768', '13771', '16321']
    priority = 'deferred blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue30458'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

    @orangetw
    Copy link
    Mannequin Author

    orangetw mannequin commented May 24, 2017

    Hi, the patch in CVE-2016-5699 can be broke by an addition space.
    http://www.cvedetails.com/cve/CVE-2016-5699/
    https://hg.python.org/cpython/rev/bf3e1c9b80e9
    https://hg.python.org/cpython/rev/1c45047c5102

    import urllib, urllib2
    
    urllib.urlopen('http://127.0.0.1\r\n\x20hihi\r\n :11211')
    urllib2.urlopen('http://127.0.0.1\r\n\x20hihi\r\n :11211')

    @orangetw orangetw mannequin added the stdlib Python modules in the Lib dir label May 24, 2017
    @zhangyangyu
    Copy link
    Member

    Looking at the code and the previous issue bpo-22928, CRLF immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )) is a valid part of a header value so the regex deliberately ignore them.

    So it looks right to me the url given doesn't raise the same exception as the url without spaces, though the given url seems malformed.

    @vadmium
    Copy link
    Member

    vadmium commented Jun 3, 2017

    You can also inject proper HTTP header fields (or do multiple requests) if you omit the space after the CRLF:

    urlopen("http://localhost:8000/ HTTP/1.1\r\nHEADER: INJECTED\r\nIgnore:")
    Data sent to the server:
    >>> server = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)
    >>> server.bind(("localhost", 8000))
    >>> server.listen()
    >>> [conn, addr] = server.accept()
    >>> pprint(conn.recv(300).splitlines(keepends=True))
    [b'GET / HTTP/1.1\r\n',
     b'HEADER: INJECTED\r\n',
     b'Ignore: HTTP/1.1\r\n',
     b'Accept-Encoding: identity\r\n',
     b'User-Agent: Python-urllib/3.5\r\n',
     b'Connection: close\r\n',
     b'Host: localhost:8000\r\n',
     b'\r\n']

    bpo-14826 is already open about how “urlopen” handles spaces, and there is a patch in bpo-13359 that proposes to also encode newline characters. But if the CRLF or header injection is a security problem, then 2.7 etc could be changed to raise an exception (like bpo-22928), or to do percent encoding.

    @vadmium vadmium added the type-security A security issue label Nov 26, 2017
    @vadmium
    Copy link
    Member

    vadmium commented Nov 26, 2017

    Actually, the CRLF + space can be injected via percent encoding, so just dealing with literal CRLFs and spaces wouldn’t be enough. You would have to validate the hostname after it is decoded.

    urlopen("http://127.0.0.1%0D%0A%20SLAVEOF . . . :6379/")
    >>> pprint(conn.recv(300).splitlines(keepends=True))
    [b'GET / HTTP/1.1\r\n',
     b'Accept-Encoding: identity\r\n',
     b'Host: 127.0.0.1\r\n',
     b' SLAVEOF . . . :6379\r\n',
     b'Connection: close\r\n',
     b'User-Agent: Python-urllib/2.7\r\n',
     b'\r\n']

    @tirkarthi
    Copy link
    Member

    See also https://bugs.python.org/issue36276 for a similar report. I think it's better to raise an error instead of encoding CRLF characters in URL similar to headers.

    I feel either of the issue and more preferably bpo-36276 closed as a duplicate of this one. Copy of msg337968 with reference to details about similar report in golang :

    For reference an exact report on golang repo : golang/go#30794 . This seemed to have been fixed in latest golang release 1.12 and commit golang/go@829c5df . The commit introduces a check for CTL characters and throws an error for URLs something similar to Python does for headers now at bf3e1c9b80e9.

    func isCTL(r rune) bool {
    return r < ' ' || 0x7f <= r && r <= 0x9f
    }

    if strings.IndexFunc(ruri, isCTL) != -1 {
    	return errors.New("net/http: can't write control character in Request.URL")
    }

    So below program used to work before go 1.12 setting a key on Redis but now it throws error :

    package main

    import "fmt"
    import "net/http"

    func main() {
    resp, err := http.Get("http://127.0.0.1:6379?\\r\\nSET test failure12\r\n:8080/test/?test=a")
    fmt.Println(resp)
    fmt.Println(err)
    }

    ➜ go version
    go version go1.12 darwin/amd64
    ➜ go run urllib_vulnerability.go
    <nil>
    parse http://127.0.0.1:6379?
    SET test failure12
    :8080/test/?test=a: net/url: invalid control character in URL

    Looking more into the commit there seemed to be a solution towards escaping characters with golang/go#22907 . The fix seemed to have broke Google's internal tests [0] and hence reverted to have the above commit where only CTL characters were checked and raises an error. I think this is a tricky bug upon reading code reviews in the golang repo that has around 2-3 reports with a fix committed to be reverted later for a more conservative fix and the issue was reopened to target go 1.13 .

    Thanks a lot for the report @ragdoll.guo

    [0] https://go-review.googlesource.com/c/go/+/159157/2#message-39c6be13a192bf760f6318ac641b432a6ab8fdc8

    @vstinner
    Copy link
    Member

    vstinner commented Apr 9, 2019

    The CVE-2019-9740 has been assigned to the bpo-36276:

    ... which has been marked as a duplicate of this issue.

    @vstinner vstinner changed the title CRLF Injection in httplib [CVE-2019-9740][security] CRLF Injection in httplib Apr 9, 2019
    @gpshead
    Copy link
    Member

    gpshead commented Apr 10, 2019

    Martin claimed "Actually, the CRLF + space can be injected via percent encoding"

    I am unable to reproduce that behavior using urllib.request.urlopen() or urllib.request.URLopener.open() in my master/3.8 tree.

    @gpshead gpshead self-assigned this Apr 10, 2019
    @vstinner
    Copy link
    Member

    Oh, I didn't recall that this issue (this class of security vulnerabilities) has a so old history. I found *A LOT* of similar open issues. Here are my notes. Maybe most open issues should be closed as duplicate of this one to clarify the status of urllib in Python? :-)

    Emails:

    Open issues:

    • 2011, bpo-13359: "urllib2 doesn't escape spaces in http requests"
      Not marked as a security issue.
    • 2012, bpo-14826: "urlopen URL with unescaped space"
      Fix using quote(self.__original, safe="%/:=&?~#+!$,;'@()*[]|")
      ... and the changed has then be reverted because it broke buildbots.
      Still open.
    • 2013, bpo-17322: "urllib.request add_header() currently allows trailing spaces (and other weird stuff)"
      Not marked as a security issue.
    • 2014, bpo-22928: "HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699)"
      Marked as fixed, but user Orange explained in the first comment of in
      bpo-30458 that the fix is incomplete.
    • 2017, bpo-30458: "[CVE-2019-9740][security] CRLF Injection in httplib" (this issue)
    • 2017, bpo-32085: "[Security] A New Era of SSRF - Exploiting URL Parser in Trending Programming Languages!"
    • 2019, bpo-35906: "[CVE-2019-9947] Header Injection in urllib" (another CVE!)

    Closed issues:

    • 2004, bpo-918368: "urllib doesn't correct server returned urls" (urllib)
      FIXED BY: commit 7c2867f
      Fix: fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]")
    • 2005, bpo-1353433: "Http redirection error in urllib2.py" (urllib2)
      FIXED BY: commit ddb84d7
      Fix: newurl = newurl.replace(' ', '%20')
    • 2005, bpo-1153027: "http_error_302() crashes with 'HTTP/1.1 400 Bad Request"
      FIXED BY: commit 690ce9b (Lib/urllib/request.py)
      Fix: fullurl = quote(fullurl, safe="%/:=&?~#+!$,;'@()*[]")
    • bpo-29606: "urllib FTP protocol stream injection"
      Duplicate of bpo-30119.
    • bpo-30119: "(ftplib) A remote attacker could possibly attack by containing the newline characters"
      FIXED BY: commmit 8c2d4cf
      Fix: reject "\r" and "\n" in FTP.putline() (Lib/ftplib.py)
    • bpo-36276: "[CVE-2019-9740] Python urllib CRLF injection vulnerability"
      Closed as duplicate of bpo-30458

    Rejected pull requests:

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 only security fixes labels Apr 10, 2019
    @vstinner
    Copy link
    Member

    • 2019, bpo-35906: "[CVE-2019-9947] Header Injection in urllib" (another CVE!)

    Gregory P. Smith just marked bpo-35906 as a duplicate of this issue. Copy of his msg339842:

    """
    my fix proposed in bpo-30458 fixes this issue.

    i do not think this one deserved its own CVE; at least https://nvd.nist.gov/vuln/detail/CVE-2019-9947's current text also points to the other one.
    """

    Until the status of CVE-2019-9947 is clarified, I added CVE-2019-9947 in the title of this issue to help to better track all CVEs :-)

    Did someone contact the CVE organization to do something with CVE-2019-9947?

    @vstinner vstinner changed the title [CVE-2019-9740][security] CRLF Injection in httplib [CVE-2019-9740][CVE-2019-9947][security] CRLF Injection in httplib Apr 10, 2019
    @vstinner
    Copy link
    Member

    The CVE-2019-9740 has been assigned to the bpo-36276

    I don't know how CVE are assigned. Since this issue started with "the patch in CVE-2016-5699 can be broke by an addition space", would it make sense to reuse CVE-2016-5699 rather than using a new CVE?

    @vstinner
    Copy link
    Member

    Closed issues:

    I forgot:

    • 2017, bpo-30713: "Reject newline character (U+000A) in URLs in urllib.parse"
      Rejected: the ftplib vulnerabilty has been fixed by bpo-30119
      with commmit 8c2d4cf.

    @vstinner vstinner changed the title [CVE-2019-9740][CVE-2019-9947][security] CRLF Injection in httplib [security][CVE-2019-9740][CVE-2019-9947] HTTP Header Injection (follow-up of CVE-2016-5699) Apr 10, 2019
    @tirkarthi
    Copy link
    Member

    As @gregory.p.smith noted in GitHub [0] this fixes only protocol level bugs. There are some parsing ambiguities in urllib that are potential security issues still to be fixed.

    bpo-20271 - urllib.urlparse('http://benign.com\\[attacker.com]') returns attacker.com as hostname . A slightly related issue https://bugs.python.org/issue20271
    bpo-35748 - urllib.urlparse(r'http://spam\\eggs!cheese&aardvark@evil.com') returns evil.com as hostname
    bpo-23505 - Urlparse insufficient validation leads to open redirect
    bpo-33661 - urllib may leak sensitive HTTP headers to a third-party web site (Redirecting from https to http might also pass some headers in plain text. This behavior was changed in requests, golang, Curl that had their own respective CVEs)

    As a fun side note this vulnerability was used by one of our own tests as a feature from 2012 to test another security issue (bpo-14001) [1] :)

    [0] #12755 (comment)
    [1] #12755 (comment)

    @vadmium
    Copy link
    Member

    vadmium commented Apr 10, 2019

    Gregory, I haven’t tried recent Python code, but I expect the problem with percent decoding is still there. If you did try my example, what results did you see? Be aware that these techniques only work if the OS co-operates and connects to localhost when you give it the longer host string. At the moment I have glibc 2.26 on x86-64 Linux.

    In the Python 3 master branch, the percent-encoding should be decoded in “urllib.request.Request._parse”:

    def _parse(self):
        ...
        self.host, self.selector = _splithost(rest)
        if self.host:
            self.host = unquote(self.host)

    Then in “AbstractHTTPHandler.do_request_” the decoded host string becomes the “Host” header field value, without any encoding:

    def do_request_(self, request):
        host = request.host
        ...
        sel_host = host
        ...
        if not request.has_header('Host'):
            request.add_unredirected_header('Host', sel_host)

    Perhaps one solution to both my version and Orange’s original version is to encode the “Host” header field value properly. This might also apply to the “http.client” code.

    @vstinner
    Copy link
    Member

    bpo-36276 has been marked as a duplicate of this issue.

    According to the following message, urllib3 is also vulnerable to HTTP Header Injection:
    https://bugs.python.org/issue36276#msg337837

    Copy of Alvin Chang's msg337837:

    """
    I am also seeing the same issue with urllib3

    import urllib3
    
    pool_manager = urllib3.PoolManager()
    
    host = "localhost:7777?a=1 HTTP/1.1\r\nX-injected: header\r\nTEST: 123"
    url = "http://" + host + ":8080/test/?test=a"

    try:
    info = pool_manager.request('GET', url).info()
    print(info)
    except Exception:
    pass

    nc -l localhost 7777
    GET /?a=1 HTTP/1.1
    X-injected: header
    TEST: 123:8080/test/?test=a HTTP/1.1
    Host: localhost:7777
    Accept-Encoding: identity
    """

    @vstinner
    Copy link
    Member

    According to the following message, urllib3 is also vulnerable to HTTP Header Injection: (...)

    And the issue has been reported to urllib3:
    urllib3/urllib3#1553

    Copy of the first message:

    """
    At https://bugs.python.org/issue36276 there's an issue in Python's urllib that an attacker controlling the request parameter can inject headers by injecting CR/LF chars.

    A commenter mentions that the same bug is present in urllib3:
    https://bugs.python.org/issue36276#msg337837

    So reporting it here to make sure it gets attention.
    """

    @vstinner
    Copy link
    Member

    Since this issue has a long history and previously attempts to fix it failed, it seems like the Internet is a black or white world, more like a scale of gray... *Maybe* we need to provide a way to allow to pass junk characters in an URL? (disable URL validation)

    Idea: add an optional parameter to urllib, httplib, maybe also ftplib, to allow arbitrary "invalid" URLs / FTP commands. It would be a parameter *per request*, not a global option.

    I don't propose to have a global configuration option like an environment variable, urllib attribute or something else. A global option would be hard to control and would impact just too much code.

    My PEP-433 has been rejected because of the sys.setdefaultcloexec(cloexec: bool) function which allowed to change globally the behavior of Python. The PEP-446 has been accepted with no *global* option to opt-in for the old behavior, but only "local" *per file descriptor*: os.set_inheritable(fd, inheritable).

    @gpshead
    Copy link
    Member

    gpshead commented Apr 10, 2019

    *Maybe* we need to provide a way to allow to pass junk characters in an URL? (disable URL validation)

    We should not do this in our http protocol stack code. Anyone who _wants_ that is already intentionally violating the http protocol which defeats the entire purpose of our library and the parameter named "url".

    Will this break something in the world other than our own test_xmlrpc test? Probably. Do they have a right to complain about it? Not one we need listen to. Such code is doing something that was clearly an abuse of the API. The parameter was named url not raw_data_to_stuff_subversively_into_the_binary_protocol. Its intent was clear.

    @vstinner
    Copy link
    Member

    Will this break something in the world other than our own test_xmlrpc test? Probably. Do they have a right to complain about it? Not one we need listen to.

    I understand. But. Can we consider that for old Python versions like Python 2.7 and 3.5?

    This change will be applied to all supported Python versions.

    I recall that when Python 2.7 started to validate TLS certificate, the change broke some applications. Are these applications badly written? Yes! But well, "it worked well before". Sometimes, when you work in a private network, the security matters less, whereas it might be very expensive to fix a legacy application. At Red Hat, we developed a solution to let customers to opt-out from this fix (to no validate TLS certificates), because it is just too expensive for customers to fix their legacy code but they would like to be able to upgrade RHEL.

    One option to not validate URLs is to downgrade Python, but I'm not sure that it's the best compromise :-/

    @vstinner
    Copy link
    Member

    It seems like a change has been pushed into urllib3 to fix this issue, but that there is an issue with international URLs and that maybe RFC 3986 should be updated.

    RFC 3986: "Uniform Resource Identifier (URI): Generic Syntax" (January 2005)
    https://www.ietf.org/rfc/rfc3986.txt

    "Without bpo-1531 or IRI support in rfc3986 releasing master in it's current state will break backwards compatibility with international URLs."

    urllib3/urllib3#1553 (comment)

    => where 1531 means urllib3/urllib3#1531

    "wave Hi! I've noticed that CVE-2019-11236 has been assigned to the CRLF injection issue described here. It seems that the library has been patched in GitHub, but no new release has been made to pypi. Will a new release containing the fix be made to pypi soon? Based on @theacodes comment it seems like a release was going to be made, but I also see her status has her perhaps unavailable. Is someone else perhaps able to cut a new release into pypi?"

    urllib3/urllib3#1553 (comment)

    @vstinner
    Copy link
    Member

    "wave Hi! I've noticed that CVE-2019-11236 has been assigned to the CRLF injection issue described here. It seems that the library has been patched in GitHub, but no new release has been made to pypi. (...)"

    This urllib3 change:
    urllib3/urllib3@0aa3e24

    urllib3 now vendors a copy of the rfc3986 library:

    https://pypi.org/project/rfc3986/

    @vstinner
    Copy link
    Member

    urllib3 now vendors a copy of the rfc3986 library:
    https://pypi.org/project/rfc3986/

    There are multiple Python projects to validate URI:

    @gpshead
    Copy link
    Member

    gpshead commented May 1, 2019

    New changeset c4e671e by Gregory P. Smith in branch 'master':
    bpo-30458: Disallow control chars in http URLs. (GH-12755)
    c4e671e

    @gpshead
    Copy link
    Member

    gpshead commented May 21, 2019

    Assigning to Larry to decide if he wants to merge that PR into 3.5 or not.

    @vstinner
    Copy link
    Member

    vstinner commented Jun 6, 2019

    Note for myself: Python 2 urllib.urlopen(url) always quotes the URL and so is not vulnerable to HTTP Header Injection (at least, not to this issue ;-)).

    @vstinner
    Copy link
    Member

    vstinner commented Jul 4, 2019

    The commit b7378d7 is incomplete: it doesn't seem to check for control characters in the "host" part of the URL, only in the "path" part of the URL. Example:
    ---
    try:
    from urllib import request as urllib_request
    except ImportError:
    import urllib2 as urllib_request
    import socket
    def bug(*args):
    raise Exception(args)
    # urlopen() must not call create_connection()
    socket.create_connection = bug
    urllib_request.urlopen('http://127.0.0.1\\r\\n\\x20hihi\\r\\n :11211')
    ---

    The URL comes from the first message of this issue:
    https://bugs.python.org/issue30458#msg294360

    Development branches 2.7 and master produce a similar output:
    ---

    Traceback (most recent call last):
     ...
    Exception: (('127.0.0.1\r\n hihi\r\n ', 11211), ..., None)

    So urllib2/urllib.request actually does a real network connection (DNS query), whereas it should reject control characters in the "host" part of the URL.


    A second problem comes into the game. Some C libraries like glibc strip the end of the hostname (strip at the first newline character) and so HTTP Header injection is still possible is this case:
    https://bugzilla.redhat.com/show_bug.cgi?id=1673465


    According to the RFC 3986, the "host" grammar doesn't allow any control character, it looks like:

       host          = IP-literal / IPv4address / reg-name

    ALPHA (letters)
    DIGIT (decimal digits)
    unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
    pct-encoded = "%" HEXDIG HEXDIG
    sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
    / "*" / "+" / "," / ";" / "="
    reg-name = *( unreserved / pct-encoded / sub-delims )

    IP-literal = "[" ( IPv6address / IPvFuture ) "]"
    IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
    IPv6address = 6( h16 ":" ) ls32
    / "::" 5( h16 ":" ) ls32
    / [ h16 ] "::" 4( h16 ":" ) ls32
    / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
    / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
    / [ *3( h16 ":" ) h16 ] "::" h16 ":" ls32
    / [ *4( h16 ":" ) h16 ] "::" ls32
    / [ *5( h16 ":" ) h16 ] "::" h16
    / [ *6( h16 ":" ) h16 ] "::"
    h16 = 1*4HEXDIG
    ls32 = ( h16 ":" h16 ) / IPv4address
    IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

    @vstinner vstinner added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes labels Jul 4, 2019
    @tirkarthi
    Copy link
    Member

    Okay, the url variable against which the regex check is made is not the full url but the path. The HTTPConnection class sets self.host [0] in the constructor which is used to send the Host header. Perhaps the regex check could be done for the host too given the path check is already done in the previous commit. With that the reported host also throws a http.client.InvalidURL exception.

    A second problem comes into the game. Some C libraries like glibc strip the end of the hostname (strip at the first newline character) and so HTTP Header injection is still possible is this case: https://bugzilla.redhat.com/show_bug.cgi?id=1673465

    The bug link raises permission error. Does fixing the host part fix this issue too since there won't be any socket connection made? Is it possible to have a Python reproducer of this issue?

    [0]

    (self.host, self.port) = self._get_hostport(host, port)

    @ret2libc
    Copy link
    Mannequin

    ret2libc mannequin commented Jul 4, 2019

    > A second problem comes into the game. Some C libraries like glibc strip the end of the hostname (strip at the first newline character) and so HTTP Header injection is still possible is this case: https://bugzilla.redhat.com/show_bug.cgi?id=1673465

    The bug link raises permission error. Does fixing the host part fix this issue too since there won't be any socket connection made? Is it possible to have a Python reproducer of this issue?

    I think this was supposed to refer to CVE-2016-10739 (https://bugzilla.redhat.com/show_bug.cgi?id=1347549)

    @larryhastings
    Copy link
    Contributor

    New changeset afe3a49 by larryhastings (Miro Hrončok) in branch '3.5':
    bpo-30458: Disallow control chars in http URLs. (GH-12755) (bpo-13207)
    afe3a49

    @ret2libc
    Copy link
    Mannequin

    ret2libc mannequin commented Aug 20, 2019

    Will the flaw outlined in https://bugs.python.org/issue30458#msg347282 be fixed in python itself? If so, I think a CVE for python should be requested to MITRE (I can request one, in that case).

    Moreover, does it make sense to create a new bug to track the new issue? This bug already references 3 CVEs and it would probably just create more confusion to reference a 4th. What do you think?

    @gpshead
    Copy link
    Member

    gpshead commented Aug 20, 2019

    I'm not a fan of CVE numbers in general, people have been creating too many of those. But that also means I just don't care if someone does. Having a CVE entry is not a way to claim something is important.

    This issue is still open and can be used to track dealing with the host.

    @jaraco
    Copy link
    Member

    jaraco commented Sep 14, 2019

    This change caused a regression or two captured in bpo-36274. Essentially, by blocking invalid requests, it's now not possible for a system intentionally to generate invalid requests for testing purposes. As these point releases of Python start making it into the wild, the impact of this change will likely increase.

    I think this patch was applied at too low a level. That is, instead of protecting the user inputs, the change protects the programmer's inputs.

    I mention this here so those interested can follow the mitigation work happening in bpo-36274.

    @ned-deily
    Copy link
    Member

    If I understand Jason's message correctly, the changes for bpo-30458 introduced a regression in 3.7.4 and will introduce the same regression in other branches as they are released, including 3.5.8 whose rc1 is now in testing. 3.7.5rc1 is scheduled to be tagged later today. Is this regression serious enough that we should hold 3.7.5 and/or 3.5.8 for a fix? If so, there should probably be a separate issue for it unless it is necessarily intertwined with the resolution of bpo-36274.

    I'm provisionally setting the status of this issue to "release blocker".

    @larryhastings
    Copy link
    Contributor

    Should we open a separate issue to track fixing the regression?

    @jaraco
    Copy link
    Member

    jaraco commented Sep 18, 2019

    Should we open a separate issue to track (fixing) the regression?

    Yes, I think so. The ticket I referenced mainly addresses an incompatibility that was introduced with Python 3.0, so is much less urgent than the one introduced more recently, so I believe it deserves a proper, independent description and discussion. I'll gladly file that ticket, tonight most likely.

    @jaraco
    Copy link
    Member

    jaraco commented Sep 18, 2019

    I've created bpo-38216 to address the (perceived) regression.

    @ned-deily
    Copy link
    Member

    With the breaking out of the portential and/or actual regression (e.g. invalid requests can no longer be crafted) into bpo-38216, itself a potential release blocker, we are still left here with the as-yet unresolved issue identified above in msg34728 (e.g. not checking for control characters in the "host" part of the URL, only the "path" part). Since this also affects so many branches/releases and has external components (CVE's, third-party impacts), it probably would have made sense to break it out into a separate issue (and maybe it still does). But since this problem has been present for many releases (apparently), I would rather not further hold the 3.7.5 release for a resolution (though that would be a good thing) so I'm going to change the priority for the moment to "deferred blocker".

    But we need someone (preferably a core dev already involved) to take charge of this and push it to a resolution. Thanks for everyone's help so far!

    @ret2libc
    Copy link
    Mannequin

    ret2libc mannequin commented Oct 23, 2019

    CVE-2019-18348 has been assigned to the issue explained in https://bugs.python.org/issue30458#msg347282 . Maybe a separate bug for it would be better though. CVE-2019-18348 is about injecting CRLF in HTTP requests through the *host* part of a URL.

    @gpshead
    Copy link
    Member

    gpshead commented Oct 23, 2019

    Can you please open a separate issue for CVE-2019-18348? It is easier to track that way.

    (META: In general I think the CVE process is being abused and that these really did not deserve that treatment. https://lwn.net/Articles/801157/ is good reading and food for thought.)

    @ret2libc
    Copy link
    Mannequin

    ret2libc mannequin commented Oct 24, 2019

    I have created https://bugs.python.org/issue38576 to address CVE-2019-18348.

    @gregory.p.smith if you have particular complains about these CVEs feel free to let me know (even privately). I think the security impact of these flaws is: an application that relies on urlopen/HTTPConnection/etc. where either the query part, the path part or the host part are user-controlled, could be exploited to send unintended HTTP headers to other hosts (maybe services that would not be directly reachable by the user).

    FYI, there were some good replies to that CVE talk, one of which is https://grsecurity.net/reports_of_cves_death_greatly_exaggerated .

    @ned-deily
    Copy link
    Member

    What is the status of this issue? Now that bpo-38576 has been opened to cover the host address part, can this issue be closed or downgraded? Should bpo-38576 be a release blocker?

    @gpshead
    Copy link
    Member

    gpshead commented Dec 9, 2019

    i believe new work will be done via the new issue. marking this closed. if there is something not covered by bpo-38576 that remains, please open a new issue for it. new discussion on this long issue is easy to get lost in.

    @gpshead gpshead closed this as completed Dec 9, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes deferred-blocker stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    9 participants