Author Guido
Recipients Guido
Date 2014-11-24.02:50:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1416797425.29.0.948172439575.issue22928@psf.upfronthosting.co.za>
In-reply-to
Content
Proof of concept:

# Script for Python 2
import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
response = opener.open("http://localhost:9999")

# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection

"""

# End of script

# Python 3
from urllib.request import urlopen, build_opener
opener = build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0' + chr(0x0A) + "Location: header injection")]
opener.open("http://localhost:9999")

# Data sent is:
"""
GET / HTTP/1.1
Accept-Encoding: identity
Host: localhost:9999
Connection: close
User-Agent: Mozilla/5.0
Location: header injection

"""

# End of script

It is the responsibility of the developer leveraging Python and its HTTP client libraries to ensure that their (web) application acts in accordance to official HTTP specifications and that no threats to security will arise from their code.
However, newlines inside headers are arguably a special case of breaking the conformity with RFC's in regard to the allowed character set. No illegal character used inside a HTTP header is likely to have a compromising side effect on back-end clients and servers and the integrity of their communication, as a result of the leniency of most web servers. However, a newline character (0x0A) embedded in a HTTP header invariably has the semantic consequence of denoting the start of an additional header line. To put it differently, not sanitizing headers in complete accordance to RFC's could be seen as as virtue in that it gives the programmer a maximum amount of freedom, without having to trade it for any likely or severe security ramifications, so that they may use illegal characters in testing environments and environments that are outlined by an expliticly less strict interpretation of the HTTP protocol. Newlines are special in that they enable anyone who is able to influence the header content, to, in effect, perform additional invocations to add_header().

In issue 17322 ( http://bugs.python.org/issue17322 ) there is some discussion as to the general compliance to RFC's by the HTTP client libraries. I'd like to opt to begin with prohibiting newline characters to be present in HTTP headers. Although this issue is not a "hard vulnerability" such as a buffer overflow, it does translate to a potentially equal level of severity when considered from the perspective of a web-enabled application, for which purpose the HTTP libraries are typically used for. Lack of input validation on the application developer's end will faciliate header injections, for example if user-supplied data will end up as cookie content verbatim.
Adding this proposed additional layer of validation inside Python minimizes the likelihood of a successful header injection while functionality is not notably affected.

I'm inclined to add this validation to putheader() in the 'http' module rather than in urllib, as this will secure all invocations to 'http' regardless of intermediate libraries such as urllib.

Included is a patch for the latest checkout of the default branch that will cause CannotSendHeader() to be raised if a newline character is detected in either a header name or its value. Aside from detecting "\n", it also breaks on "\r" as their respective implications can be similar. Feel free to adjust, rewrite and transpose this to other branches where you feel this is appropriate.


Guido Vranken
Intelworks
History
Date User Action Args
2014-11-24 02:50:25Guidosetrecipients: + Guido
2014-11-24 02:50:25Guidosetmessageid: <1416797425.29.0.948172439575.issue22928@psf.upfronthosting.co.za>
2014-11-24 02:50:25Guidolinkissue22928 messages
2014-11-24 02:50:23Guidocreate