Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699) #67117

Closed
Guido mannequin opened this issue Nov 24, 2014 · 27 comments
Closed

HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699) #67117

Guido mannequin opened this issue Nov 24, 2014 · 27 comments
Labels
stdlib Python modules in the Lib dir type-security A security issue

Comments

@Guido
Copy link
Mannequin

Guido mannequin commented Nov 24, 2014

BPO 22928
Nosy @birkenfeld, @orsenthil, @vstinner, @larryhastings, @benjaminp, @ned-deily, @bitdancer, @berkerpeksag, @vadmium, @serhiy-storchaka, @koobs, @demianbrecht
PRs
  • [3.3] bpo-22928: Disabled HTTP header injections in http.client. #2817
  • [3.3][security] bpo-22928: Disabled HTTP header injections in http.client #2861
  • bpo-11671: add header validation from http.client to wsgiref.headers.Headers #15299
  • Files
  • disable_http_header_injection.patch: Patch that disables HTTP header injections in Lib/http/client.py
  • issue22928.patch
  • issue22928_1.patch
  • issue22928_2.patch
  • issue22928_3.patch
  • issue22928_4.patch
  • issue22928_5.patch
  • issue22928_6.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2017-07-26.03:58:32.449>
    created_at = <Date 2014-11-24.02:50:25.230>
    labels = ['type-security', 'library']
    title = 'HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699)'
    updated_at = <Date 2019-08-15.04:09:21.328>
    user = 'https://bugs.python.org/Guido'

    bugs.python.org fields:

    activity = <Date 2019-08-15.04:09:21.328>
    actor = 'epicfaace'
    assignee = 'none'
    closed = True
    closed_date = <Date 2017-07-26.03:58:32.449>
    closer = 'ned.deily'
    components = ['Library (Lib)']
    creation = <Date 2014-11-24.02:50:25.230>
    creator = 'Guido'
    dependencies = []
    files = ['37264', '38133', '38154', '38158', '38190', '38399', '38433', '38449']
    hgrepos = []
    issue_num = 22928
    keywords = ['patch', 'security_issue']
    message_count = 27.0
    messages = ['231590', '232696', '235938', '235942', '235944', '235945', '236106', '236123', '236125', '236137', '237450', '237478', '237523', '237593', '237828', '237832', '237915', '237918', '237919', '237957', '269210', '269660', '298814', '299049', '299053', '299071', '299202']
    nosy_count = 16.0
    nosy_names = ['georg.brandl', 'orsenthil', 'vstinner', 'larry', 'benjamin.peterson', 'ned.deily', 'Arfrever', 'r.david.murray', 'python-dev', 'berker.peksag', 'martin.panter', 'serhiy.storchaka', 'koobs', 'demian.brecht', 'Guido', 'vladk']
    pr_nums = ['2817', '2861', '15299']
    priority = None
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue22928'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4', 'Python 3.5']

    @Guido
    Copy link
    Mannequin Author

    Guido mannequin commented Nov 24, 2014

    Proof of concept:

    # Script for Python 2
    import urllib2
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
    response = opener.open("http://localhost:9999")

    # Data sent is:
    """
    GET / HTTP/1.1
    Accept-Encoding: identity
    Host: localhost:9999
    Connection: close
    User-Agent: Mozilla/5.0
    Location: header injection

    """

    # End of script

    # Python 3
    from urllib.request import urlopen, build_opener
    opener = build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
    opener.open("http://localhost:9999")

    # Data sent is:
    """
    GET / HTTP/1.1
    Accept-Encoding: identity
    Host: localhost:9999
    Connection: close
    User-Agent: Mozilla/5.0
    Location: header injection

    """

    # End of script

    It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code.
    However, newlines inside headers are arguably a special case of breaking the conformity with RFC's in regard to the allowed character set. No illegal character used inside a HTTP header is likely to have a compromising side effect on back-end clients and servers and the integrity of their communication, as a result of the leniency of most web servers. However, a newline character (0x0A) embedded in a HTTP header invariably has the semantic consequence of denoting the start of an additional header line. To put it differently, not sanitizing headers in complete accordance to RFC's could be seen as as virtue in that it gives the programmer a maximum amount of freedom, without having to trade it for any likely or severe security ramifications, so that they may use illegal characters in testing environments and environments that are outlined by an expliticly less strict interpretation of the HTTP protocol. Newlines are special in that they enable anyone who is able to influence the header content, to, in effect, perform additional invocations to add_header().

    In bpo-17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a "hard vulnerability" such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim.
    Adding this proposed additional layer of validation inside Python minimizes the likelihood of a successful header injection while functionality is not notably affected.

    I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib.

    Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting "\n", it also breaks on "\r" as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate.

    Guido Vranken
    Intelworks

    @Guido Guido mannequin added stdlib Python modules in the Lib dir type-security A security issue labels Nov 24, 2014
    @vadmium
    Copy link
    Member

    vadmium commented Dec 16, 2014

    There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).

    Perhaps the same error should be raised for values that cannot be encoded in Latin-1? Also, maybe most control characters should trigger an error, not just CR and LF. Apart from line folding, the HTTP RFC only allows visible ASCII characters, spaces and tabs, and obsolete non-ASCII chars >= 0x80.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Feb 14, 2015

    Here's a patch addressing the potential vulnerability as reported. The patch should also bring the implementation up to date with the most recent standards around header names and values.

    There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).

    I think I'm okay with this given line folding seems to have been implemented by passing multiple value parameters (folding was automatically taken care of by the library).

    I don't think that this should be merged into anything pre 3.5 as safeguarding /should/ be accounted for by the developer, so I don't think I'd regard this as a high risk security issue. I'm definitely open to debate on that though.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 14, 2015

    If we’re in the realm of 3.5 only changes, it might make sense to remove the multi-argument mode of putheader() altogether, and document it only generates a single line. (Currently still says it generates multiple lines.)

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Feb 14, 2015

    I think that keeping the public API as-is is the better way to go, at
    least for the shorter term given it won't require users to have to make
    code changes. Thanks for the catch on the docs though, will update that.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 14, 2015

    Good point. Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Feb 16, 2015

    Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.

    After thinking about this a little more, I think I'd prefer to keep
    spaces rather than tabs. The reason being is that, in my mind, now that
    continuations have been made obsolete it's more natural to do something
    like this:

    putheader('Authorization', 'Bearer', 'my_token')

    than

    putheader('Authorization', 'Bearer my_token')

    I realize it's a semantic change from previous behavior, but it seems to
    me to be preferable given the latest RFCs. I'd think that at some point
    in the future, we'd want to remove \x20 from the valid header value
    range to entirely conform to the spec. This is the first step in
    allowing for graceful deprecation.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 17, 2015

    But it is not natural to do things like this (based on headers sent by Firefox):

    putheader("User-Agent", "Mozilla/5.0", "(X11;", "Linux", "x86_64;", "rv:25.0)", "Gecko/20100101", "Firefox/25.0")
    putheader("Accept-Encoding", "gzip,", "deflate")

    A way to properly encode different kinds of header values would be nice, but I don’t think the current putheader() API is it.

    Also, I still think it would be good to allow non-ASCII characters in header values. At least when an already-encoded byte string is supplied. For instance, RTSP is based on HTTP but uses UTF-8 as a default encoding, rather than HTTP’s Latin-1. Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Feb 17, 2015

    But it is not natural to do things like this (based on headers sent by Firefox)

    Good point.

    Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.

    I’m a little torn on this one given one of the SHOULD clauses in RFC 7230 about recipients treating headers with non-ASCII characters as opaque data. However, I’ve read a number of occasions where users are using latin-1 in practice (and it /is/ only a SHOULD clause), so I think it’s likely better to err on the side of caution and allow for the latin-1 charset at least.

    As for utf-8 though, I think that once we start getting into the realm of other application protocols, that’s something that should have to be extended by the client implementation and not something that should be changed in the base HTTP implementation.

    The odd part of the API though now is the fact that it’s variadic. I really have no strong opinion on whether elements should be tab or space delimited and the RFC doesn’t seem to lean either way. I think I’m still leaning towards space delimiting to give users the ability to write in either form (putheader(‘Authorization’, ‘Bearer’, ‘token’) or putheader(‘Authorization’, ‘Bearer token’)). As another minor argument for it, it’s also likely a little nicer for logging. I think that optimally, the API would be a single value as you’d suggested, but I’d be concerned about the extent of backwards compatibility issues if that were to be done.

    I’ll try to get some time tomorrow to make those changes, so it still leaves time for further debate :)

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Feb 17, 2015

    I’ve updated the patch to include the latin-1 charset in legal header values. It still uses a space as delimiter, but all other comments should now be addressed.

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld.

    Is there a limit to the length of header line? Would not unfolding all header values exceed the limit?

    @serhiy-storchaka serhiy-storchaka self-assigned this Mar 7, 2015
    @vadmium
    Copy link
    Member

    vadmium commented Mar 7, 2015

    Folded header fields are deprecated as of RFC 7230; see <https://tools.ietf.org/html/rfc7230#section-3.2.4\>. The only reasons to fold them I can think of is for readability (debugging), when generating a messsage/http MIME message (which I don’t think the Python library supports), or maybe when dealing with a strange server limitation. Normally there is not meant to be a limit for lines in the HTTP header, although it is common to limit the total unfolded header field value.

    If we go ahead and drop folding support, perhaps we should deprecate the putheader() multi-argument mode, rather than just document the arguments are now joined by spaces. It seems a pointless API now with this change.

    @serhiy-storchaka
    Copy link
    Member

    May be not drop folding support, but just deprecate the putheader() multi-argument mode? And of course add sanity checks for separate putheader() arguments.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Mar 9, 2015

    After a chat with David and getting my head wrapped more around backwards compatibility, I also agree that the changes in the patch are far too strict. It's much more important to preserve backwards compatibility than to strictly conform to the RFC.

    I've updated the patch to allow for (what should be) anything that was previously allowed as header name/value pairs minus carriage returns not immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )). This change fixes the reported issue but should not otherwise affect backwards compatibility.

    Additionally, even though line folding is deprecated by RFC 7230, I don't think it's necessarily an issue to support line folding until proven to be a problem in practice. With the current implementation, users have the ability to conform to the target server/proxy requirements, based on errors (if obs-fold isn't transparently dealt with as suggested) yielded by each as defined in the RFC. In light of that, I don't think that it's even worthwhile to start deprecating multi-parameter putheader at this point (but I'm open to argument on that one).

    One note on the deprecation is that if we deprecate multi-parameter, we should also add a warning if an embedded line fold is detected in a single headers value.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Mar 11, 2015

    Latest patch should now address all comments.

    @vadmium
    Copy link
    Member

    vadmium commented Mar 11, 2015

    I much prefer the new patch with better compatibility and flexibility :)

    If you want to strengthen the tests to reflect some of the decisions made here you could add the following tests:

    Positive tests:

    • putheader('C1-Control', b'next\x85line')
    • putheader('Embedded-Fold', 'is\r\n\tallowed')

    Negative tests:

    • putheader(' Leading-Space', 'value')
    • putheader('Embedded: colon', 'extra value')

    @serhiy-storchaka
    Copy link
    Member

    I'll return previous implementation of header value regex. All other LGTM.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 12, 2015

    New changeset 1c45047c5102 by Serhiy Storchaka in branch '2.7':
    Issue bpo-22928: Disabled HTTP header injections in httplib.
    https://hg.python.org/cpython/rev/1c45047c5102

    New changeset bf3e1c9b80e9 by Serhiy Storchaka in branch '3.4':
    Issue bpo-22928: Disabled HTTP header injections in http.client.
    https://hg.python.org/cpython/rev/bf3e1c9b80e9

    New changeset aa4c6992c126 by Serhiy Storchaka in branch 'default':
    Issue bpo-22928: Disabled HTTP header injections in http.client.
    https://hg.python.org/cpython/rev/aa4c6992c126

    @serhiy-storchaka
    Copy link
    Member

    Added new tests and tweaked regexpes. Thank you for your contribution Demian.

    Now we can start long-standing deprecation process for conforming to RFC.

    @demianbrecht
    Copy link
    Mannequin

    demianbrecht mannequin commented Mar 12, 2015

    Thanks for the tweaks Serhiy, those seem reasonable to me.

    @vladk
    Copy link
    Mannequin

    vladk mannequin commented Jun 24, 2016

    Doesn't this affect Python 3.3 as well, which is in security-only mode? Shouldn't that version be patched as well?

    @koobs
    Copy link

    koobs commented Jul 1, 2016

    3.3 is supported for security related fixes until September 2017 [1], but only 3.4, 3.5 and 2.7 have received the backport, reopen for outstanding merge

    [1] https://docs.python.org/devguide/#status-of-python-branches

    Update summary to reflect the RedHat CVE that was assigned to this issue.

    @koobs koobs reopened this Jul 1, 2016
    @koobs koobs changed the title HTTP header injection in urrlib2/urllib/httplib/http.client HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699) Jul 1, 2016
    @ned-deily
    Copy link
    Member

    Getting to be the last chance to backport this for 3.3.x.

    @serhiy-storchaka
    Copy link
    Member

    What is the difference between PR 2817 and PR 2861?

    @vstinner
    Copy link
    Member

    What is the difference between PR 2817 and PR 2861?

    Oh crap, I didn't know that you already created a PR. I compared the two PR:

    • My PR adds \A at the start of:
      _is_legal_header_name = re.compile(rb'\A[^:\\s][^:\\r\\n]*\Z').match
    • My PR uses blurb, yours modifies Misc/NEWS

    The \A was copied from the Python 2.7 commit, since Python 3.3 doesn't have fullmatch().

    If you are ok to add \A and convert the NEWS entry to blurb, I can abandon my duplicated PR.

    @serhiy-storchaka
    Copy link
    Member

    \A is not needed. match() always matches from the start.

    @ned-deily
    Copy link
    Member

    New changeset 8e88f6b by Ned Deily (Serhiy Storchaka) in branch '3.3':
    [3.3] bpo-22928: Disabled HTTP header injections in http.client. (bpo-2817)
    8e88f6b

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants