This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: email.Message.as_string infinite loop
Type: security Stage: resolved
Components: email Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, epicfaace, maxking, miss-islington, mytran, r.david.murray
Priority: normal Keywords: patch

Created on 2019-08-05 16:23 by mytran, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 15239 merged epicfaace, 2019-08-12 23:04
PR 15654 merged epicfaace, 2019-09-03 04:35
PR 15686 merged maxking, 2019-09-05 00:41
Messages (16)
msg349055 - (view) Author: My Tran (mytran) Date: 2019-08-05 16:23
The following will hang the system until it runs out of memory. 

import email
import email.policy

text = """From: user@host.com
To: user@host.com
Bad-Header:
 =?us-ascii?Q?LCSwrV11+IB0rSbSker+M9vWR7wEDSuGqmHD89Gt=ea0nJFSaiz4vX3XMJPT4vrE?=
 =?us-ascii?Q?xGUZeOnp0o22pLBB7CYLH74Js=wOlK6Tfru2U47qR?=
 =?us-ascii?Q?72OfyEY2p2=2FrA9xNFyvH+fBTCmazxwzF8nGkK6D?=

Hello!
"""

eml = email.message_from_string(text, policy=email.policy.SMTPUTF8)
eml.as_string()
msg349196 - (view) Author: Ashwin Ramaswami (epicfaace) * Date: 2019-08-07 20:28
I can't reproduce this on 3.9: https://github.com/epicfaace/cpython/runs/187997615
msg349199 - (view) Author: Ashwin Ramaswami (epicfaace) * Date: 2019-08-07 20:37
I also can't reproduce this on 3.7: https://github.com/epicfaace/cpython/runs/188005822
msg349200 - (view) Author: My Tran (mytran) Date: 2019-08-07 20:51
Reproduced on 3.7.4

Looks like this started happening after this commit: https://github.com/python/cpython/commit/dc20fc4311dece19488299a7cd11317ffbe4d3c3#diff-19171ae20182f6759204a3436475ddd1
msg349202 - (view) Author: My Tran (mytran) Date: 2019-08-07 22:00
I looked at the job at https://travis-ci.com/epicfaace/cpython/jobs/223345147 and its running py3.6.
msg349277 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-08-09 08:23
This does look like a side-effect of the commit mentioned by mytran.

The issues seems to be that email._header_value_parser.get_unstructured wrongfully assumes that anything leading with '=?' would be a valid rfc 2047 encoded word.

This is a smaller test case to reproduce this bug:

from email._header_value_parser import get_unstructured
get_unstructured('=?utf-8?q?FSaiz4vX3XMJPT4vrExGUZeOnp0o22pLBB7CYLH74Js=3DwOlK6Tfru2U47qR?=72OfyEY2p2/rA9xNFyvH+fBTCmazxwzF8nGkK6D')
msg349358 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-08-10 20:32
Adding security label since this can cause DOS.
msg349505 - (view) Author: Ashwin Ramaswami (epicfaace) * Date: 2019-08-12 23:09
Oh, both the Travis links I sent actually ended up reproducing the bug.

I've made a PR that fixes with an even smaller test case:

get_unstructured('=?utf-8?q?somevalue?=aa')

It looks like this is caused because "aa" is thought to be an encoded word escape in https://github.com/python/cpython/blob/fd5a82a7685d1599aab12e722a383cb0a2adfd8a/Lib/email/_header_value_parser.py#L1042 -- thus, get_encoded_word fails, which ends up making get_unstructured go in an infinite loop.

My PR makes the parser parse "=?utf-8?q?somevalue?=aa" as "=?utf-8?q?somevalue?=aa". However, the existing test cases make sure it parses "=?utf-8?q?somevalue?=nowhitespace" as "somevaluenowhitespace". I'm not too familiar with RFC 2047, but why are "aa" and "nowhitespace" treated differently? Should they be?
msg349847 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-08-16 05:55
You have correctly identified that "=aa" is detected as a encoded word and causes the get_encoded_word to fail.

However, "=?utf-8?q?somevalue?=aa" should ideally get parsed as "somevalueaa" and not "=?utf-8?q?somevalue?=aa". This is because "=?utf-8?q?somevalue?=" is a valid encoded word, it is just not followed by an empty whitespace. 

modified   Lib/email/_header_value_parser.py
@@ -1037,7 +1037,10 @@ def get_encoded_word(value):
         raise errors.HeaderParseError(
             "expected encoded word but found {}".format(value))
     remstr = ''.join(remainder)
-    if len(remstr) > 1 and remstr[0] in hexdigits and remstr[1] in hexdigits:
+    if (len(remstr) > 1 and
+        remstr[0] in hexdigits and
+        remstr[1] in hexdigits and
+        tok.count('?') < 2):
         # The ? after the CTE was followed by an encoded word escape (=XX).
         rest, *remainder = remstr.split('?=', 1)

This can be avoided by checking `?` occurs twice in the `tok`.

The 2nd bug, which needs a better test case, is that if the encoded_word is invalid, you will keep running into infinite loop, which you correctly fixed in your PR. However, the test case you used is more appropriate for the first issue.

You can fix both the issues, for which, you need to add a test case for 2nd issue and fix for the first issue.

Looking into the PR now.
msg349848 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-08-16 05:58
I meant, =aa is identified as encoded word escape
msg349849 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-08-16 06:05
Although, the 2nd bug I spoke of is kind of speculative, I haven't been able to find a test case which matches rfc2047_matcher but raises exception with get_encoded_word (after, ofcourse, the first bug is fixed), which the only way to cause an infinite loop.
msg349962 - (view) Author: Ashwin Ramaswami (epicfaace) * Date: 2019-08-19 18:12
Thanks, I've fixed the first case as you suggested.

I found an example of the 2nd case -- '=?utf-8?q?=somevalue?=' -- which causes the method to hang. I've added a fix, though I'm not sure if it treats the string properly -- it parses it as '=?utf-8?q?=somevalue?=' and doesn't raise any defects. Is that the behavior we would want?
msg350919 - (view) Author: miss-islington (miss-islington) Date: 2019-08-31 15:25
New changeset c5b242f87f31286ad38991bc3868cf4cfbf2b681 by Miss Islington (bot) (Ashwin Ramaswami) in branch 'master':
bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239)
https://github.com/python/cpython/commit/c5b242f87f31286ad38991bc3868cf4cfbf2b681
msg351089 - (view) Author: miss-islington (miss-islington) Date: 2019-09-03 16:42
New changeset ea21389dda401457198fb214aa2c981a45ed9528 by Miss Islington (bot) (Ashwin Ramaswami) in branch '3.7':
[3.7] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) (GH-15654)
https://github.com/python/cpython/commit/ea21389dda401457198fb214aa2c981a45ed9528
msg351160 - (view) Author: Abhilash Raj (maxking) * (Python committer) Date: 2019-09-05 01:20
New changeset 6ad0a2c45f78020f7994e47620c1cf7b225f8197 by Abhilash Raj in branch '3.8':
[3.8] bpo-37764: Fix infinite loop when parsing unstructured email headers. (GH-15239) (GH-15686)
https://github.com/python/cpython/commit/6ad0a2c45f78020f7994e47620c1cf7b225f8197
msg351177 - (view) Author: Ashwin Ramaswami (epicfaace) * Date: 2019-09-05 05:18
Should we get a CVE for this because this is a security issue?
History
Date User Action Args
2022-04-11 14:59:18adminsetgithub: 81945
2019-09-05 05:18:05epicfaacesetmessages: + msg351177
2019-09-05 02:05:57maxkingsetresolution: fixed
2019-09-05 01:21:40maxkingsetstatus: open -> closed
stage: patch review -> resolved
2019-09-05 01:20:44maxkingsetmessages: + msg351160
2019-09-05 00:41:23maxkingsetpull_requests: + pull_request15344
2019-09-03 16:42:58miss-islingtonsetmessages: + msg351089
2019-09-03 04:35:34epicfaacesetpull_requests: + pull_request15321
2019-08-31 15:25:38miss-islingtonsetnosy: + miss-islington
messages: + msg350919
2019-08-19 18:12:10epicfaacesetmessages: + msg349962
2019-08-16 06:05:33maxkingsetmessages: + msg349849
2019-08-16 05:58:50maxkingsetmessages: + msg349848
2019-08-16 05:55:46maxkingsetmessages: + msg349847
2019-08-15 04:24:05epicfaacesetversions: + Python 3.9
2019-08-12 23:09:39epicfaacesetmessages: + msg349505
2019-08-12 23:04:05epicfaacesetkeywords: + patch
stage: patch review
pull_requests: + pull_request14960
2019-08-10 20:32:47maxkingsettype: security
messages: + msg349358
2019-08-09 08:23:33maxkingsetversions: + Python 3.8
2019-08-09 08:23:24maxkingsetmessages: + msg349277
2019-08-07 22:00:47mytransetmessages: + msg349202
2019-08-07 20:51:53mytransetmessages: + msg349200
2019-08-07 20:37:06epicfaacesetmessages: + msg349199
2019-08-07 20:28:29epicfaacesetnosy: + epicfaace
messages: + msg349196
2019-08-05 16:30:21xtreaksetnosy: + maxking
2019-08-05 16:23:23mytrancreate