Title: email.header.Header too lax with embeded newlines
Type: security Stage: resolved
Components: Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: Arfrever, barry, jwilk, loewis, pl, r.david.murray, terry.reedy, vvl, ysj.ray
Priority: high Keywords: patch

Created on 2009-04-28 21:25 by jwilk, last changed 2011-01-10 21:37 by r.david.murray. This issue is now closed.

File name Uploaded Description Edit
header_injection.diff r.david.murray, 2010-12-27 03:01
Messages (15)
msg86767 - (view) Author: Jakub Wilk (jwilk) Date: 2009-04-28 21:25
>>> from email.mime.text import MIMEText
>>> from email.header import Header
>>> msg = MIMEText('dummy')
>>> h = Header('dummy\nX-Injected-Header: yes')
>>> msg['Subject'] = h
>>> print msg.as_string()
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: dummy
X-Injected-Header: yes

msg112897 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-04 21:13
The example, which combines part of the subject line with another header line, strikes me as bizarre, confusing, and unnecessary. Can you provide some rationale, motivation, or use case? Without that, I would tend to think this should be closed.
msg112959 - (view) Author: Jakub Wilk (jwilk) Date: 2010-08-05 08:49
Sorry if that wasn't clear in the first place: this is not a feature request. I agree that current behavior for email.header.Header (as shown in the first message) is bizarre, confusing, unnecessary, and potentially leading to vulnerabilities - that's why I filed this bug.
msg112996 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-05 16:29
My apologies for misunderstanding. Feature removal requests are sufficiently rare that I thought you were posting a mockup of what you wanted to be able to do, as many others have done. So we definitely agree on that example. I changed the title to be a little clearer that stricter checking is desired.

Unless that example violates a prohibition in the docs, removing this bad feature is still a feature request. The point is moot, however, since RDM is working on a mail module overhaul (for 3.2, I hope) and I doubt previous versions will be separately patched. I will let him decide if examples like this can be sensibly prohibited without disabling something that should be allowed.
msg113065 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-06 03:20
Yeah, it's a good question whether or not this is enough of a behavior change that the fix can't be backported.  On the other hand, this is definitely a bug (the RFCs specifiy that header values may not contain newlines or carriage returns), so at the moment I'm inclined toward backporting.  I've got it on my list of things to worry about in the next version of email, but unfortunately that won't make it into 3.2.  I'm raising the priority because I think I should at least fix this in the existing package in 3.2.

Jakub, if you feel motivated to propose a patch that inclues unit tests, that would be very helpful.  I believe the correct thing to do would be to raise a ValueError if the argument to Header contains any /n or /r characters.
msg113066 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2010-08-06 03:35
Given this "The email package attempts to be as RFC-compliant as possible," and your reading, I agree this is a backport-worth bug.
This is the "prohibition in the docs" I was looking for.
msg113157 - (view) Author: Ville V. Lindholm (vvl) Date: 2010-08-07 09:23
I tried doing a naive implementation (just checking for \n or \r in the argument to Header) but that breaks a lot of unit tests. For example the test message msg_16.txt contains a header like this:

Received: from ( [])
	by (Postfix) with ESMTP id CCC2C51B84
	for <>; Sun, 23 Sep 2001 20:13:54 -0700 (PDT)

in other words the header is split up by \n\t. I'm not very familiar with the RFCs, is there some smart way to do this? (This is my first attempt to contribute to Python btw!) It seems many tests rely on this "bug" in Header.
msg113160 - (view) Author: ysj.ray (ysj.ray) Date: 2010-08-07 10:08
Besides, not only the header value, but also the header name can only be printable ascii characters according to RFC2822:

"A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except   colon."

But the msg object's __getitem__() seems to accept all ascii characters.
msg113175 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-07 14:32
Yeah, after I posted that suggestion it occurred to me that the newline stuff *is* embedded in the way email5 works, but I hadn't gotten back here to post a followup yet.  I don't like it, but it is what it is.  (I believe the theory is to preserve the formatting of the text that was parsed from an input email6 I'm planning on handling that a different and safer way, albeit at the cost of using extra memory.)

So to fix the issue in email5 I we'd have to make sure there is whitespace following the newlines when *emitting* headers...which could then break roundtripping badly formed emails, which is not likely to be acceptable, and certainly would not be backportable. :(

In other words, it's not obvious to me that we *can* fix this in email5. :( :(

Thanks for working on this, and if you come up with any clever ideas let me know, and I'll do the same :)
msg113177 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-08-07 14:36
Re __getitem__: yes, the email package allows you to work with messages that are *not* RFC conformant (which you do encounter when processing actual email :), and this is an important feature.  The plan in email6 is to detect and report additional RFC violations, with an optional "strict" mode to raise errors if any violations are encountered.
msg124702 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-12-27 03:01
I've considered this a bit more deeply, and it turns out to be simpler to fix than I originally thought, assuming the fix is acceptable.

When a message is parsed we obviously wind up with headers that don't have any embedding issues.  So, if we check for embedded headers at message production time and reject them, we can't be breaking any round-trip properties of the package.  The only way for a header malformed in this way to get produced is through it being added to a Message object via one of the header adding APIs.  Further, for this specific issue we are only worried about things that look like header labels that follow a newline and don't have whitespace in front of them.  We don't have to worry about the other RFC restrictions on headers in order to fix this.

I tried a patch that checked at header add time, and while that is potentially workable it got fairly complicated and is a bit fragile (did I get *all* the places a header can be added?)  Instead, the attached patch takes the approach of throwing an error for an embedded header at message serialization time.  The advantage here is that all headers are run through Header.encode on serialization, so there's only one bit of code that needs to be modified to pretty much guarantee that header injection can't happen.

There are code paths for producing individual headers that don't go through encode, but these don't produce complete messages, so it shouldn't be a problem (and may even be a feature).

Barry, do you think this is indeed enough of a security issue that this fix (if acceptable) should be backported to 2.6?

I should note that this patch produces a situation unusual[*] for the email package, where serialization may throw an error (but only on a Message object that has been modified).  Also, I'm reusing HeaderParseError, which may not be optimal, although it does seem at least semi-logical.

[*] Generator currently only throws an error itself in one place, if it encounters a bytes payload for a text Message.  Again, something that can only happen in a modified Message, so this seems analogous.
msg125820 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-09 02:59
Receiving no negative votes :), I've committed this to py3k in r87873, 3.1 in r87874, and 2.7 in r87875.

Barry, Martin, do you think this should be backported as a security fix?  I'm thinking it should be.
msg125929 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2011-01-10 21:01
I'm inclined not to support backporting to Python 2.6.  It seems like a fairly rare and narrow hole for security problem, because it would require a program to add the bogus header explicitly, as opposed to getting it after parsing some data.  To me, that smacks of SQL-injection or XSS type bug, where it's really the application that's the problem.

Or in other words, assuming you don't have a program that is deliberately adding such headers (and then it should be considered a feature, i.e. they know what they're doing), then you'd have to trick a header-adding program to add some unvalidated text.

I dunno, it doesn't seem like a serious enough threat to backport.
msg125932 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-10 21:32
Well, imagine a web form that has a 'subject' text entry field, and the application does Message['Subject'] = subject_from_form as it builds a Message to hand off to smtp.sendmail.  If the application didn't sanitize the subject for newlines (and as a programmer I doubt I would have thought of doing that), then we can have header injection.  So, yes, it is analogous to an sql injection attack.

Since we don't have a report of an exploit, I'm fine with not backporting it.
msg125933 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2011-01-10 21:37
Ah, I should clarify.  A sensible web application should be dealing with any multiline input it allows by turning it into a newline-less single line before using it as a subject, so the probability that there are exploitable applications out there is, I think, sufficiently low that a backport isn't needed.
Date User Action Args
2011-01-10 21:37:09r.david.murraysetnosy: loewis, barry, terry.reedy, jwilk, pl, Arfrever, r.david.murray, ysj.ray, vvl
messages: + msg125933
2011-01-10 21:32:34r.david.murraysetstatus: open -> closed
nosy: loewis, barry, terry.reedy, jwilk, pl, Arfrever, r.david.murray, ysj.ray, vvl
messages: + msg125932
2011-01-10 21:01:23barrysetnosy: loewis, barry, terry.reedy, jwilk, pl, Arfrever, r.david.murray, ysj.ray, vvl
messages: + msg125929
2011-01-09 17:14:07Arfreversetnosy: + Arfrever
2011-01-09 02:59:50r.david.murraysetnosy: + loewis
messages: + msg125820
resolution: fixed

type: behavior -> security
stage: patch review -> resolved
2010-12-27 03:01:59r.david.murraysetfiles: + header_injection.diff
versions: + Python 3.1, Python 2.7
nosy: barry, terry.reedy, jwilk, pl, r.david.murray, ysj.ray, vvl
messages: + msg124702

keywords: + patch
stage: test needed -> patch review
2010-11-20 16:03:35r.david.murraysetkeywords: - easy
2010-08-07 14:36:16r.david.murraysetmessages: + msg113177
2010-08-07 14:32:20r.david.murraysetmessages: + msg113175
2010-08-07 10:08:00ysj.raysetnosy: + ysj.ray
messages: + msg113160
2010-08-07 09:23:56vvlsetnosy: + vvl
messages: + msg113157
2010-08-06 03:35:01terry.reedysetmessages: + msg113066
2010-08-06 03:20:55r.david.murraysetpriority: normal -> high
keywords: + easy
messages: + msg113065
2010-08-05 16:29:31terry.reedysetmessages: + msg112996
title: email.header.Header allows to embed raw newlines into a message -> email.header.Header too lax with embeded newlines
2010-08-05 08:49:51jwilksetstatus: pending -> open
type: enhancement -> behavior
messages: + msg112959

title: email.header.Header allow to embed raw newlines into a message -> email.header.Header allows to embed raw newlines into a message
2010-08-04 21:13:58terry.reedysetstatus: open -> pending

type: enhancement
versions: + Python 3.2, - Python 2.6, Python 2.5
nosy: + terry.reedy

messages: + msg112897
stage: test needed
2010-05-05 13:41:50barrysetassignee: barry -> r.david.murray

nosy: + r.david.murray
2009-04-28 21:43:19benjamin.petersonsetassignee: barry

nosy: + barry
2009-04-28 21:25:35jwilkcreate