classification
Title: HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699)
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.5, Python 3.4, Python 3.3, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: georg.brandl Nosy List: Arfrever, Guido, berker.peksag, demian.brecht, georg.brandl, koobs, martin.panter, orsenthil, python-dev, r.david.murray, serhiy.storchaka, vladk
Priority: normal Keywords: patch, security_issue

Created on 2014-11-24 02:50 by Guido, last changed 2016-07-01 12:08 by serhiy.storchaka.

Files
File name Uploaded Description Edit
disable_http_header_injection.patch Guido, 2014-11-24 02:50 Patch that disables HTTP header injections in Lib/http/client.py review
issue22928.patch demian.brecht, 2015-02-14 01:07 review
issue22928_1.patch demian.brecht, 2015-02-16 17:44 review
issue22928_2.patch demian.brecht, 2015-02-17 15:56 review
issue22928_3.patch demian.brecht, 2015-02-20 15:58 review
issue22928_4.patch demian.brecht, 2015-03-09 05:31 review
issue22928_5.patch demian.brecht, 2015-03-11 00:23 review
issue22928_6.patch demian.brecht, 2015-03-12 06:21 review
Messages (22)
msg231590 - (view) Author: Guido Vranken (Guido) Date: 2014-11-24 02:50
Proof of concept:

# Script for Python 2
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
response = opener.open("http://localhost:9999")

# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection

"""

# End of script

# Python 3
from urllib.request import urlopen, build_opener
opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
opener.open("http://localhost:9999")

# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection

"""

# End of script

It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code.
However, newlines inside headers are arguably a special case of breaking the conformity with RFC's in regard to the allowed character set. No illegal character used inside a HTTP header is likely to have a compromising side effect on back-end clients and servers and the integrity of their communication, as a result of the leniency of most web servers. However, a newline character (0x0A) embedded in a HTTP header invariably has the semantic consequence of denoting the start of an additional header line. To put it differently, not sanitizing headers in complete accordance to RFC's could be seen as as virtue in that it gives the programmer a maximum amount of freedom, without having to trade it for any likely or severe security ramifications, so that they may use illegal characters in testing environments and environments that are outlined by an expliticly less strict interpretation of the HTTP protocol. Newlines are special in that they enable anyone who is able to influence the header content, to, in effect, perform additional invocations to add_header().

In issue 17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a "hard vulnerability" such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim.
Adding this proposed additional layer of validation inside Python minimizes the likelihood of a successful header injection while functionality is not notably affected.

I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib.

Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting "\n", it also breaks on "\r" as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate.


Guido Vranken
Intelworks
msg232696 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-12-16 01:43
There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).

Perhaps the same error should be raised for values that cannot be encoded in Latin-1? Also, maybe most control characters should trigger an error, not just CR and LF. Apart from line folding, the HTTP RFC only allows visible ASCII characters, spaces and tabs, and obsolete non-ASCII chars >= 0x80.
msg235938 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-02-14 01:07
Here's a patch addressing the potential vulnerability as reported. The patch should also bring the implementation up to date with the most recent standards around header names and values.

> There could be potential for breaking compatibility if people are intentionally sending values with folded lines (obsoleted by the new HTTP RFC).

I think I'm okay with this given line folding seems to have been implemented by passing multiple value parameters (folding was automatically taken care of by the library).

I don't think that this should be merged into anything pre 3.5 as safeguarding /should/ be accounted for by the developer, so I don't think I'd regard this as a high risk security issue. I'm definitely open to debate on that though.
msg235942 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-14 01:52
If we’re in the realm of 3.5 only changes, it might make sense to remove the multi-argument mode of putheader() altogether, and document it only generates a single line. (Currently still says it generates multiple lines.)
msg235944 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-02-14 02:04
I think that keeping the public API as-is is the better way to go, at
least for the shorter term given it won't require users to have to make
code changes. Thanks for the catch on the docs though, will update that.
msg235945 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-14 02:09
Good point. Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.
msg236106 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-02-16 17:44
> Maybe join them with tabs rather than spaces then, since it was previously "\r\n\t". This way it is even closer to before.

After thinking about this a little more, I think I'd prefer to keep
spaces rather than tabs. The reason being is that, in my mind, now that
continuations have been made obsolete it's more natural to do something
like this:

putheader('Authorization', 'Bearer', 'my_token')

than

putheader('Authorization', 'Bearer my_token')

I realize it's a semantic change from previous behavior, but it seems to
me to be preferable given the latest RFCs. I'd think that at some point
in the future, we'd want to remove \x20 from the valid header value
range to entirely conform to the spec. This is the first step in
allowing for graceful deprecation.
msg236123 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-17 03:48
But it is not natural to do things like this (based on headers sent by Firefox):

putheader("User-Agent", "Mozilla/5.0", "(X11;", "Linux", "x86_64;", "rv:25.0)", "Gecko/20100101", "Firefox/25.0")
putheader("Accept-Encoding", "gzip,", "deflate")

A way to properly encode different kinds of header values would be nice, but I don’t think the current putheader() API is it.

Also, I still think it would be good to allow non-ASCII characters in header values. At least when an already-encoded byte string is supplied. For instance, RTSP is based on HTTP but uses UTF-8 as a default encoding, rather than HTTP’s Latin-1. Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.
msg236125 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-02-17 05:04
> But it is not natural to do things like this (based on headers sent by Firefox)

Good point.

> Otherwise, retaining the one_value.encode('latin-1') call is confusing when later on it rejects non-ASCII-encoded characters.

I’m a little torn on this one given one of the SHOULD clauses in RFC 7230 about recipients treating headers with non-ASCII characters as opaque data. However, I’ve read a number of occasions where users are using latin-1 in practice (and it /is/ only a SHOULD clause), so I think it’s likely better to err on the side of caution and allow for the latin-1 charset at least.

As for utf-8 though, I think that once we start getting into the realm of other application protocols, that’s something that should have to be extended by the client implementation and not something that should be changed in the base HTTP implementation.

The odd part of the API though now is the fact that it’s variadic. I really have no strong opinion on whether elements should be tab or space delimited and the RFC doesn’t seem to lean either way. I think I’m still leaning towards space delimiting to give users the ability to write in either form (putheader(‘Authorization’, ‘Bearer’, ‘token’) or putheader(‘Authorization’, ‘Bearer token’)). As another minor argument for it, it’s also likely a little nicer for logging. I think that optimally, the API would be a single value as you’d suggested, but I’d be concerned about the extent of backwards compatibility issues if that were to be done.

I’ll try to get some time tomorrow to make those changes, so it still leaves time for further debate :)
msg236137 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-02-17 15:56
I’ve updated the patch to include the latin-1 charset in legal header values. It still uses a space as delimiter, but all other comments should now be addressed.
msg237450 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-07 14:43
Added comments on Rietveld.

Is there a limit to the length of header line? Would not unfolding all header values exceed the limit?
msg237478 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-07 22:01
Folded header fields are deprecated as of RFC 7230; see <https://tools.ietf.org/html/rfc7230#section-3.2.4>. The only reasons to fold them I can think of is for readability (debugging), when generating a messsage/http MIME message (which I don’t think the Python library supports), or maybe when dealing with a strange server limitation. Normally there is not meant to be a limit for lines in the HTTP header, although it is common to limit the total unfolded header field value.

If we go ahead and drop folding support, perhaps we should deprecate the putheader() multi-argument mode, rather than just document the arguments are now joined by spaces. It seems a pointless API now with this change.
msg237523 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-08 09:18
May be not drop folding support, but just deprecate the putheader() multi-argument mode? And of course add sanity checks for separate putheader() arguments.
msg237593 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-09 05:31
After a chat with David and getting my head wrapped more around backwards compatibility, I also agree that the changes in the patch are far too strict. It's much more important to preserve backwards compatibility than to strictly conform to the RFC.

I've updated the patch to allow for (what should be) anything that was previously allowed as header name/value pairs minus carriage returns not immediately followed by a tab or space (obs-fold: CRLF 1*( SP / HTAB )). This change fixes the reported issue but should not otherwise affect backwards compatibility.

Additionally, even though line folding is deprecated by RFC 7230, I don't think it's necessarily an issue to support line folding until proven to be a problem in practice. With the current implementation, users have the ability to conform to the target server/proxy requirements, based on errors (if obs-fold isn't transparently dealt with as suggested) yielded by each as defined in the RFC. In light of that, I don't think that it's even worthwhile to start deprecating multi-parameter putheader at this point (but I'm open to argument on that one).

One note on the deprecation is that if we deprecate multi-parameter, we should also add a warning if an embedded line fold is detected in a single headers value.
msg237828 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-11 00:23
Latest patch should now address all comments.
msg237832 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-03-11 01:05
I much prefer the new patch with better compatibility and flexibility :)

If you want to strengthen the tests to reflect some of the decisions made here you could add the following tests:

Positive tests:
* putheader('C1-Control', b'next\x85line')
* putheader('Embedded-Fold', 'is\r\n\tallowed')

Negative tests:
* putheader(' Leading-Space', 'value')
* putheader('Embedded: colon', 'extra value')
msg237915 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-12 07:05
I'll return previous implementation of header value regex. All other LGTM.
msg237918 - (view) Author: Roundup Robot (python-dev) Date: 2015-03-12 09:16
New changeset 1c45047c5102 by Serhiy Storchaka in branch '2.7':
Issue #22928: Disabled HTTP header injections in httplib.
https://hg.python.org/cpython/rev/1c45047c5102

New changeset bf3e1c9b80e9 by Serhiy Storchaka in branch '3.4':
Issue #22928: Disabled HTTP header injections in http.client.
https://hg.python.org/cpython/rev/bf3e1c9b80e9

New changeset aa4c6992c126 by Serhiy Storchaka in branch 'default':
Issue #22928: Disabled HTTP header injections in http.client.
https://hg.python.org/cpython/rev/aa4c6992c126
msg237919 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-12 09:21
Added new tests and tweaked regexpes. Thank you for your contribution Demian.

Now we can start long-standing deprecation process for conforming to RFC.
msg237957 - (view) Author: Demian Brecht (demian.brecht) * Date: 2015-03-12 16:18
Thanks for the tweaks Serhiy, those seem reasonable to me.
msg269210 - (view) Author: Vlad K. (vladk) Date: 2016-06-24 20:15
Doesn't this affect Python 3.3 as well, which is in security-only mode? Shouldn't that version be patched as well?
msg269660 - (view) Author: Kubilay Kocak (koobs) Date: 2016-07-01 12:02
3.3 is supported for security related fixes until September 2017 [1], but only 3.4, 3.5 and 2.7 have received the backport, reopen for outstanding merge

[1] https://docs.python.org/devguide/#status-of-python-branches

Update summary to reflect the RedHat CVE that was assigned to this issue.
History
Date User Action Args
2016-07-01 12:08:23serhiy.storchakasetassignee: serhiy.storchaka -> georg.brandl
2016-07-01 12:02:37koobssetversions: + Python 3.3
2016-07-01 12:02:04koobssetstatus: closed -> open

title: HTTP header injection in urrlib2/urllib/httplib/http.client -> HTTP header injection in urrlib2/urllib/httplib/http.client (CVE-2016-5699)
keywords: + security_issue
nosy: + koobs

messages: + msg269660
resolution: fixed ->
2016-06-25 19:25:02serhiy.storchakasetnosy: + georg.brandl
2016-06-24 20:15:14vladksetnosy: + vladk
messages: + msg269210
2015-03-15 15:35:28Arfreversetnosy: + Arfrever
2015-03-12 16:18:31demian.brechtsetmessages: + msg237957
2015-03-12 09:21:27serhiy.storchakasetstatus: open -> closed
versions: + Python 2.7, Python 3.4
messages: + msg237919

resolution: fixed
stage: commit review -> resolved
2015-03-12 09:16:13python-devsetnosy: + python-dev
messages: + msg237918
2015-03-12 07:05:01serhiy.storchakasetmessages: + msg237915
stage: patch review -> commit review
2015-03-12 06:21:17demian.brechtsetfiles: + issue22928_6.patch
2015-03-11 01:05:25martin.pantersetmessages: + msg237832
2015-03-11 00:23:54demian.brechtsetfiles: + issue22928_5.patch

messages: + msg237828
2015-03-09 05:31:53demian.brechtsetfiles: + issue22928_4.patch

messages: + msg237593
2015-03-08 09:19:00serhiy.storchakasetmessages: + msg237523
2015-03-07 22:01:17martin.pantersetmessages: + msg237478
2015-03-07 14:43:56serhiy.storchakasetassignee: serhiy.storchaka

messages: + msg237450
nosy: + serhiy.storchaka
2015-02-20 15:58:58demian.brechtsetfiles: + issue22928_3.patch
2015-02-20 12:25:03berker.peksagsetnosy: + berker.peksag
stage: patch review

versions: - Python 3.6
2015-02-17 15:56:16demian.brechtsetfiles: + issue22928_2.patch

messages: + msg236137
2015-02-17 05:04:05demian.brechtsetmessages: + msg236125
2015-02-17 03:48:08martin.pantersetmessages: + msg236123
2015-02-17 00:17:25demian.brechtsetversions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4
2015-02-16 17:44:54demian.brechtsetfiles: + issue22928_1.patch

messages: + msg236106
2015-02-14 02:09:03martin.pantersetmessages: + msg235945
2015-02-14 02:04:40demian.brechtsetmessages: + msg235944
2015-02-14 01:52:46martin.pantersetmessages: + msg235942
2015-02-14 01:07:08demian.brechtsetfiles: + issue22928.patch

messages: + msg235938
2014-12-16 01:43:51martin.pantersetnosy: + martin.panter
messages: + msg232696
2014-12-15 05:52:57demian.brechtsetnosy: + demian.brecht
2014-11-24 14:38:21r.david.murraysetnosy: + orsenthil, r.david.murray
2014-11-24 02:50:25Guidocreate