Issue 27397: email.message.Message.get_payload(decode=True) raises AssertionError that "should never happen"

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/71584

classification

Title:	email.message.Message.get_payload(decode=True) raises AssertionError that "should never happen"
Type:	behavior	Stage:	resolved
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7, Python 3.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	r.david.murray	Nosy List:	Claudiu Saftoiu, CryptidVulpes, ned.deily, r.david.murray, taleinat
Priority:	high	Keywords:	patch

Created on 2016-06-27 04:52 by Claudiu Saftoiu, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
bugreport.py	Claudiu Saftoiu, 2016-06-27 04:52	File to reproduce the bug
bugreport_moretests.py	Claudiu Saftoiu, 2016-06-27 05:09	more test cases
issue27397_poc.py	CryptidVulpes, 2016-07-28 13:22	Script that will reproduce the bug
issue27397_poc_minimal.py	CryptidVulpes, 2016-07-28 13:23	Minimal script to reproduce the bug

Pull Requests
URL	Status	Linked	Edit
PR 7583	merged	taleinat, 2018-06-10 07:00
PR 7664	merged	miss-islington, 2018-06-12 12:47
PR 7665	merged	miss-islington, 2018-06-12 12:48

Messages (17)
msg269346 - (view)	Author: Claudiu Saftoiu (Claudiu Saftoiu)	Date: 2016-06-27 04:52
I'm processing Yahoo! Groups backup archives, and came across an email message which causes the `.get_payload(decode=True)` step to raise an AssertionError. Particularly, the following exception is raised in `lib/python3.5/email/_encoded_words.py`, line 124: # This should never happen. raise AssertionError("unexpected binascii.Error") Attached is the file which, when run under Python 3.5.1, causes the exception to be raised.
msg269348 - (view)	Author: Claudiu Saftoiu (Claudiu Saftoiu)	Date: 2016-06-27 05:09
See attached another file with more test cases.
msg269375 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-06-27 13:20
This appears to be a bug in b64decode. It should not be raising that error if validate=False, as it is in the code being executed during the test case. It's not that b64decode is ignoring the validate argument; the error appears to be data-dependent. I don't have time to investigate this further at the moment, perhaps someone else can pick it up.
msg271545 - (view)	Author: Jami Lindh (CryptidVulpes)	Date: 2016-07-28 13:22
I stumbled upon this bug as well while fuzzing with AFL. The curious thing is that email.message_from_string still accepts that garbled message as a valid email.
msg271546 - (view)	Author: Jami Lindh (CryptidVulpes)	Date: 2016-07-28 13:23
I also attached a minimal script containing only the decode call and the garbage payload.
msg271594 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-07-29 01:46
Thanks for narrowing this down. There are two problems: the first is that the base64 module is raising "Incorrect padding" when the issue is actually that the input string cannot be decoded using the base64 algorithm no matter what padding might be supplied. The second is that the else clause in _encoded_words can happen, in this circumstance. The circumstance is that the input string contains n4+1 characters. No encoding of source text will ever produce that number of output characters. (I also seem to have used an index one greater than needed for the loop; the maximum number of padding characters is 2.) The interesting question is what we want to do for error recovery in the n4+1 case. Do we declare the part invalid and return an empty string? Do we try chopping off the last character and see if we can decode it? The latter is complicated by the possible presence of invalid alphabet characters, but it is what I lean toward. I'm open to alternate suggestions. And to a patch if someone wants to write one :)
msg271595 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-07-29 01:53
It might be better to add a character: that will produce garbage as the last byte, but that could serve as an alert that something is wrong...and we don't know if the extra character is at the end or at the beginning of the base64 encoded string, or if, as in this case, it is multiple extra characters which will produce trailing garbage anyway. The example in hand is, I would guess, by far the most likely scenario.
msg271607 - (view)	Author: Jami Lindh (CryptidVulpes)	Date: 2016-07-29 05:55
In my opinion, this is still a clear case of "invalid input, raise an error", but I don't think AssertionError is the right one. Maybe just don't catch the unexpected binascii.Error and let it fly towards the user? I might go even one step further and just let base64 library handle and raise all the necessary errors here, similarly to how your "encode_b" function works. If you want more error resilience (meaning: email lib would not crash with invalid inputs), I would think returning an empty string is okay, since that base64 input is clearly not a valid base64 that is going to decode into anything.
msg271608 - (view)	Author: Jami Lindh (CryptidVulpes)	Date: 2016-07-29 05:57
Also, I'm not sure what is the development status in Python 3.4 but in my case this bug happened using the Debian Jessie Python version 3.4.2-2
msg271611 - (view)	Author: Jami Lindh (CryptidVulpes)	Date: 2016-07-29 06:33
It might also make sense to return the payload undecoded. The documentation for get_payload() function says: "[...] the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded)." Even though the header's value tries to convince you "base64" is the encoding, it is - in this case - either broken base64 or not. Hence it might fall into the category "some other encoding is used", justifying the "payload is returned as-is (undecoded)". As to the original payload Claudiu posted, in that the mailserver has truncated the email. This already provides the user with non-base64 string that they try to convince you to decode as base64. My argument is still valid in this case.
msg271625 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-07-29 14:09
Returning the undecoded payload is a good idea. Let's go with that. The email module, unlike most stdlib packages, has a mandate that the parser should never raise an error. Instead we do our best to guess (very unlike everything else in python!) and note 'defects' in the message. The reason this is the case is Postel's Law, which has become one of the guiding principles in dealing with email over the years: "be conservative in what you do, and generous in what you accept". So, the generator will raise errors (if the original input didn't come from the parser), but the parser will not, if at all possible. (For those who want aggressive error checking, python3 the 'raise_on_defect' policy setting.) For the versions, we use that field to indicate which versions the bug will get fixed in, which is why I removed 3.4.
msg318763 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-06-05 15:36
See #33770, which will make recognizing and handling this case straightforward.
msg319338 - (view)	Author: Ned Deily (ned.deily) *	Date: 2018-06-11 20:12
> See #33770, which will make recognizing and handling this case straightforward. The fix for #33770 is only in 3.7.0 and later, so does this mean this change should not be backported to 3.6? Updating the the versions accordingly. If it is appropriate for 3.6, we can add that back in.
msg319364 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-06-12 05:47
I ended up fixing this independently of #33770. AFAIK this should just work with 3.6.
msg319377 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-06-12 12:46
New changeset c3f55be7dd012b7e92901627d0b31c21e983ccb4 by Tal Einat in branch 'master': bpo-27397: Make email module properly handle invalid-length base64 strings (#7583) https://github.com/python/cpython/commit/c3f55be7dd012b7e92901627d0b31c21e983ccb4
msg319381 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-06-12 13:46
New changeset 7b82281c80d0064559866afe92f19cae5978c841 by Tal Einat (Miss Islington (bot)) in branch '3.7': bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7664) https://github.com/python/cpython/commit/7b82281c80d0064559866afe92f19cae5978c841
msg319382 - (view)	Author: Tal Einat (taleinat) *	Date: 2018-06-12 13:46
New changeset 63dd1f70c1aa57494802c9b68f9db4918620fc8f by Tal Einat (Miss Islington (bot)) in branch '3.6': bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7665) https://github.com/python/cpython/commit/63dd1f70c1aa57494802c9b68f9db4918620fc8f

History
Date	User	Action	Args
2022-04-11 14:58:33	admin	set	github: 71584
2018-06-12 13:47:42	taleinat	set	status: open -> closed stage: patch review -> resolved resolution: fixed versions: + Python 3.6
2018-06-12 13:46:29	taleinat	set	messages: + msg319382
2018-06-12 13:46:14	taleinat	set	messages: + msg319381
2018-06-12 12:48:33	miss-islington	set	pull_requests: + pull_request7280
2018-06-12 12:47:40	miss-islington	set	pull_requests: + pull_request7279
2018-06-12 12:46:24	taleinat	set	messages: + msg319377
2018-06-12 05:47:13	taleinat	set	messages: + msg319364
2018-06-11 20:12:38	ned.deily	set	nosy: + ned.deily messages: + msg319338 versions: + Python 3.7, Python 3.8, - Python 3.5, Python 3.6
2018-06-10 07:00:27	taleinat	set	keywords: + patch stage: needs patch -> patch review pull_requests: + pull_request7205
2018-06-05 15:36:11	taleinat	set	nosy: + taleinat messages: + msg318763
2016-07-29 14:09:15	r.david.murray	set	messages: + msg271625 versions: - Python 3.4
2016-07-29 06:33:37	CryptidVulpes	set	messages: + msg271611
2016-07-29 05:57:14	CryptidVulpes	set	messages: + msg271608 versions: + Python 3.4
2016-07-29 05:55:18	CryptidVulpes	set	messages: + msg271607
2016-07-29 01:53:47	r.david.murray	set	messages: + msg271595
2016-07-29 01:46:40	r.david.murray	set	type: crash -> behavior stage: needs patch messages: + msg271594 versions: - Python 3.4
2016-07-28 13:23:41	CryptidVulpes	set	files: + issue27397_poc_minimal.py messages: + msg271546
2016-07-28 13:22:50	CryptidVulpes	set	files: + issue27397_poc.py versions: + Python 3.4 nosy: + CryptidVulpes messages: + msg271545
2016-06-27 13:20:06	r.david.murray	set	messages: + msg269375 versions: + Python 3.6
2016-06-27 05:09:46	Claudiu Saftoiu	set	files: + bugreport_moretests.py messages: + msg269348
2016-06-27 04:58:54	serhiy.storchaka	set	priority: normal -> high assignee: r.david.murray nosy: + r.david.murray
2016-06-27 04:52:48	Claudiu Saftoiu	create