msg231590 - (view) |
Author: Guido Vranken (Guido) |
Date: 2014-11-24 02:50 |
Proof of concept:
# Script for Python 2
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
response = opener.open("http://localhost:9999")
# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection
"""
# End of script
# Python 3
from urllib.request import urlopen, build_opener
opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
opener.open("http://localhost:9999")
# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection
"""
# End of script
It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code.
However, newlines inside headers are arguably a special case of breaking the conformity with RFC's in regard to the allowed character set. No illegal character used inside a HTTP header is likely to have a compromising side effect on back-end clients and servers and the integrity of their communication, as a result of the leniency of most web servers. However, a newline character (0x0A) embedded in a HTTP header invariably has the semantic consequence of denoting the start of an additional header line. To put it differently, not sanitizing headers in complete accordance to RFC's could be seen as as virtue in that it gives the programmer a maximum amount of freedom, without having to trade it for any likely or severe security ramifications, so that they may use illegal characters in testing environments and environments that are outlined by an expliticly less strict interpretation of the HTTP protocol. Newlines are special in that they enable anyone who is able to influence the header content, to, in effect, perform additional invocations to add_header().
In issue 17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a "hard vulnerability" such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim.
Adding this proposed additional layer of validation inside Python minimizes the likelihood of a successful header injection while functionality is not notably affected.
I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib.
Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting "\n", it also breaks on "\r" as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate.
Guido Vranken
Intelworks
|
msg232696 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2014-12-16 01:43 |
There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).
Perhaps the same error should be raised for values that cannot be encoded in Latin-1? Also, maybe most control characters should trigger an error, not just CR and LF. Apart from line folding, the HTTP RFC only allows visible ASCII characters, spaces and tabs, and obsolete non-ASCII chars >= 0x80.
|
msg235938 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-02-14 01:07 |
Here's a patch addressing the potential vulnerability as reported. The patch should also bring the implementation up to date with the most recent standards around header names and values.
> There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).
I think I'm okay with this given line folding seems to have been implemented by passing multiple value parameters (folding was automatically taken care of by the library).
I don't think that this should be merged into anything pre 3.5 as safeguarding /should/ be accounted for by the developer, so I don't think I'd regard this as a high risk security issue. I'm definitely open to debate on that though.
|
msg235942 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-14 01:52 |
If we’re in the realm of 3.5 only changes, it might make sense to remove the multi-argument mode of putheader() altogether, and document it only generates a single line. (Currently still says it generates multiple lines.)
|
msg235944 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-02-14 02:04 |
I think that keeping the public API as-is is the better way to go, at
least for the shorter term given it won't require users to have to make
code changes. Thanks for the catch on the docs though, will update that.
|
msg235945 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-14 02:09 |
Good point. Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.
|
msg236106 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-02-16 17:44 |
> Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.
After thinking about this a little more, I think I'd prefer to keep
spaces rather than tabs. The reason being is that, in my mind, now that
continuations have been made obsolete it's more natural to do something
like this:
putheader('Authorization', 'Bearer', 'my_token')
than
putheader('Authorization', 'Bearer my_token')
I realize it's a semantic change from previous behavior, but it seems to
me to be preferable given the latest RFCs. I'd think that at some point
in the future, we'd want to remove \x20 from the valid header value
range to entirely conform to the spec. This is the first step in
allowing for graceful deprecation.
|
msg236123 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-02-17 03:48 |
But it is not natural to do things like this (based on headers sent by Firefox):
putheader("User-Agent", "Mozilla/5.0", "(X11;", "Linux", "x86_64;", "rv:25.0)", "Gecko/20100101", "Firefox/25.0")
putheader("Accept-Encoding", "gzip,", "deflate")
A way to properly encode different kinds of header values would be nice, but I don’t think the current putheader() API is it.
Also, I still think it would be good to allow non-ASCII characters in header values. At least when an already-encoded byte string is supplied. For instance, RTSP is based on HTTP but uses UTF-8 as a default encoding, rather than HTTP’s Latin-1. Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.
|
msg236125 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-02-17 05:04 |
> But it is not natural to do things like this (based on headers sent by Firefox)
Good point.
> Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.
I’m a little torn on this one given one of the SHOULD clauses in RFC 7230 about recipients treating headers with non-ASCII characters as opaque data. However, I’ve read a number of occasions where users are using latin-1 in practice (and it /is/ only a SHOULD clause), so I think it’s likely better to err on the side of caution and allow for the latin-1 charset at least.
As for utf-8 though, I think that once we start getting into the realm of other application protocols, that’s something that should have to be extended by the client implementation and not something that should be changed in the base HTTP implementation.
The odd part of the API though now is the fact that it’s variadic. I really have no strong opinion on whether elements should be tab or space delimited and the RFC doesn’t seem to lean either way. I think I’m still leaning towards space delimiting to give users the ability to write in either form (putheader(‘Authorization’, ‘Bearer’, ‘token’) or putheader(‘Authorization’, ‘Bearer token’)). As another minor argument for it, it’s also likely a little nicer for logging. I think that optimally, the API would be a single value as you’d suggested, but I’d be concerned about the extent of backwards compatibility issues if that were to be done.
I’ll try to get some time tomorrow to make those changes, so it still leaves time for further debate :)
|
msg236137 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-02-17 15:56 |
I’ve updated the patch to include the latin-1 charset in legal header values. It still uses a space as delimiter, but all other comments should now be addressed.
|
msg237450 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-03-07 14:43 |
Added comments on Rietveld.
Is there a limit to the length of header line? Would not unfolding all header values exceed the limit?
|
msg237478 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-03-07 22:01 |
Folded header fields are deprecated as of RFC 7230; see <https://tools.ietf.org/html/rfc7230#section-3.2.4>. The only reasons to fold them I can think of is for readability (debugging), when generating a messsage/http MIME message (which I don’t think the Python library supports), or maybe when dealing with a strange server limitation. Normally there is not meant to be a limit for lines in the HTTP header, although it is common to limit the total unfolded header field value.
If we go ahead and drop folding support, perhaps we should deprecate the putheader() multi-argument mode, rather than just document the arguments are now joined by spaces. It seems a pointless API now with this change.
|
msg237523 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-03-08 09:18 |
May be not drop folding support, but just deprecate the putheader() multi-argument mode? And of course add sanity checks for separate putheader() arguments.
|
msg237593 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-03-09 05:31 |
After a chat with David and getting my head wrapped more around backwards compatibility, I also agree that the changes in the patch are far too strict. It's much more important to preserve backwards compatibility than to strictly conform to the RFC.
I've updated the patch to allow for (what should be) anything that was previously allowed as header name/value pairs minus carriage returns not immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )). This change fixes the reported issue but should not otherwise affect backwards compatibility.
Additionally, even though line folding is deprecated by RFC 7230, I don't think it's necessarily an issue to support line folding until proven to be a problem in practice. With the current implementation, users have the ability to conform to the target server/proxy requirements, based on errors (if obs-fold isn't transparently dealt with as suggested) yielded by each as defined in the RFC. In light of that, I don't think that it's even worthwhile to start deprecating multi-parameter putheader at this point (but I'm open to argument on that one).
One note on the deprecation is that if we deprecate multi-parameter, we should also add a warning if an embedded line fold is detected in a single headers value.
|
msg237828 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-03-11 00:23 |
Latest patch should now address all comments.
|
msg237832 - (view) |
Author: Martin Panter (martin.panter) * |
Date: 2015-03-11 01:05 |
I much prefer the new patch with better compatibility and flexibility :)
If you want to strengthen the tests to reflect some of the decisions made here you could add the following tests:
Positive tests:
* putheader('C1-Control', b'next\x85line')
* putheader('Embedded-Fold', 'is\r\n\tallowed')
Negative tests:
* putheader(' Leading-Space', 'value')
* putheader('Embedded: colon', 'extra value')
|
msg237915 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-03-12 07:05 |
I'll return previous implementation of header value regex. All other LGTM.
|
msg237918 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2015-03-12 09:16 |
New changeset 1c45047c5102 by Serhiy Storchaka in branch '2.7':
Issue #22928: Disabled HTTP header injections in httplib.
https://hg.python.org/cpython/rev/1c45047c5102
New changeset bf3e1c9b80e9 by Serhiy Storchaka in branch '3.4':
Issue #22928: Disabled HTTP header injections in http.client.
https://hg.python.org/cpython/rev/bf3e1c9b80e9
New changeset aa4c6992c126 by Serhiy Storchaka in branch 'default':
Issue #22928: Disabled HTTP header injections in http.client.
https://hg.python.org/cpython/rev/aa4c6992c126
|
msg237919 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-03-12 09:21 |
Added new tests and tweaked regexpes. Thank you for your contribution Demian.
Now we can start long-standing deprecation process for conforming to RFC.
|
msg237957 - (view) |
Author: Demian Brecht (demian.brecht) * |
Date: 2015-03-12 16:18 |
Thanks for the tweaks Serhiy, those seem reasonable to me.
|
msg269210 - (view) |
Author: Vlad K. (vladk) |
Date: 2016-06-24 20:15 |
Doesn't this affect Python 3.3 as well, which is in security-only mode? Shouldn't that version be patched as well?
|
msg269660 - (view) |
Author: Kubilay Kocak (koobs) |
Date: 2016-07-01 12:02 |
3.3 is supported for security related fixes until September 2017 [1], but only 3.4, 3.5 and 2.7 have received the backport, reopen for outstanding merge
[1] https://docs.python.org/devguide/#status-of-python-branches
Update summary to reflect the RedHat CVE that was assigned to this issue.
|
msg298814 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2017-07-21 18:16 |
Getting to be the last chance to backport this for 3.3.x.
|
msg299049 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-07-25 11:40 |
What is the difference between PR 2817 and PR 2861?
|
msg299053 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-25 11:52 |
> What is the difference between PR 2817 and PR 2861?
Oh crap, I didn't know that you already created a PR. I compared the two PR:
* My PR adds \A at the start of:
_is_legal_header_name = re.compile(rb'\A[^:\s][^:\r\n]*\Z').match
* My PR uses blurb, yours modifies Misc/NEWS
The \A was copied from the Python 2.7 commit, since Python 3.3 doesn't have fullmatch().
If you are ok to add \A and convert the NEWS entry to blurb, I can abandon my duplicated PR.
|
msg299071 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-07-25 12:53 |
\A is not needed. match() always matches from the start.
|
msg299202 - (view) |
Author: Ned Deily (ned.deily) * |
Date: 2017-07-26 03:54 |
New changeset 8e88f6b5e2a35ee458c161aa3f2b7f1f17fb45d1 by Ned Deily (Serhiy Storchaka) in branch '3.3':
[3.3] bpo-22928: Disabled HTTP header injections in http.client. (#2817)
https://github.com/python/cpython/commit/8e88f6b5e2a35ee458c161aa3f2b7f1f17fb45d1
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:10 | admin | set | github: 67117 |
2019-08-15 04:09:21 | epicfaace | set | pull_requests:
+ pull_request15023 |
2017-07-26 03:58:32 | ned.deily | set | priority: release blocker -> assignee: georg.brandl -> status: open -> closed resolution: fixed stage: backport needed -> resolved |
2017-07-26 03:54:34 | ned.deily | set | messages:
+ msg299202 |
2017-07-25 12:53:07 | serhiy.storchaka | set | messages:
+ msg299071 |
2017-07-25 11:52:07 | vstinner | set | nosy:
+ vstinner messages:
+ msg299053
|
2017-07-25 11:40:50 | serhiy.storchaka | set | messages:
+ msg299049 |
2017-07-25 10:47:53 | vstinner | set | pull_requests:
+ pull_request2912 |
2017-07-23 07:01:22 | serhiy.storchaka | set | priority: normal -> release blocker nosy:
+ benjamin.peterson, larry
|
2017-07-22 20:23:22 | serhiy.storchaka | set | pull_requests:
+ pull_request2870 |
2017-07-22 13:04:19 | koobs | set | stage: resolved -> backport needed |
2017-07-21 18:16:02 | ned.deily | set | nosy:
+ ned.deily messages:
+ msg298814
|
2016-07-01 12:08:23 | serhiy.storchaka | set | assignee: serhiy.storchaka -> georg.brandl |
2016-07-01 12:02:37 | koobs | set | versions:
+ Python 3.3 |
2016-07-01 12:02:04 | koobs | set | status: closed -> open
title: HTTP header injection in urrlib2/urllib/httplib/http.client -> HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699) keywords:
+ security_issue nosy:
+ koobs
messages:
+ msg269660 resolution: fixed -> (no value) |
2016-06-25 19:25:02 | serhiy.storchaka | set | nosy:
+ georg.brandl
|
2016-06-24 20:15:14 | vladk | set | nosy:
+ vladk messages:
+ msg269210
|
2015-03-15 15:35:28 | Arfrever | set | nosy:
+ Arfrever
|
2015-03-12 16:18:31 | demian.brecht | set | messages:
+ msg237957 |
2015-03-12 09:21:27 | serhiy.storchaka | set | status: open -> closed versions:
+ Python 2.7, Python 3.4 messages:
+ msg237919
resolution: fixed stage: commit review -> resolved |
2015-03-12 09:16:13 | python-dev | set | nosy:
+ python-dev messages:
+ msg237918
|
2015-03-12 07:05:01 | serhiy.storchaka | set | messages:
+ msg237915 stage: patch review -> commit review |
2015-03-12 06:21:17 | demian.brecht | set | files:
+ issue22928_6.patch |
2015-03-11 01:05:25 | martin.panter | set | messages:
+ msg237832 |
2015-03-11 00:23:54 | demian.brecht | set | files:
+ issue22928_5.patch
messages:
+ msg237828 |
2015-03-09 05:31:53 | demian.brecht | set | files:
+ issue22928_4.patch
messages:
+ msg237593 |
2015-03-08 09:19:00 | serhiy.storchaka | set | messages:
+ msg237523 |
2015-03-07 22:01:17 | martin.panter | set | messages:
+ msg237478 |
2015-03-07 14:43:56 | serhiy.storchaka | set | assignee: serhiy.storchaka
messages:
+ msg237450 nosy:
+ serhiy.storchaka |
2015-02-20 15:58:58 | demian.brecht | set | files:
+ issue22928_3.patch |
2015-02-20 12:25:03 | berker.peksag | set | nosy:
+ berker.peksag stage: patch review
versions:
- Python 3.6 |
2015-02-17 15:56:16 | demian.brecht | set | files:
+ issue22928_2.patch
messages:
+ msg236137 |
2015-02-17 05:04:05 | demian.brecht | set | messages:
+ msg236125 |
2015-02-17 03:48:08 | martin.panter | set | messages:
+ msg236123 |
2015-02-17 00:17:25 | demian.brecht | set | versions:
- Python 2.7, Python 3.2, Python 3.3, Python 3.4 |
2015-02-16 17:44:54 | demian.brecht | set | files:
+ issue22928_1.patch
messages:
+ msg236106 |
2015-02-14 02:09:03 | martin.panter | set | messages:
+ msg235945 |
2015-02-14 02:04:40 | demian.brecht | set | messages:
+ msg235944 |
2015-02-14 01:52:46 | martin.panter | set | messages:
+ msg235942 |
2015-02-14 01:07:08 | demian.brecht | set | files:
+ issue22928.patch
messages:
+ msg235938 |
2014-12-16 01:43:51 | martin.panter | set | nosy:
+ martin.panter messages:
+ msg232696
|
2014-12-15 05:52:57 | demian.brecht | set | nosy:
+ demian.brecht
|
2014-11-24 14:38:21 | r.david.murray | set | nosy:
+ orsenthil, r.david.murray
|
2014-11-24 02:50:25 | Guido | create | |