classification
Title: email.message.Message.get_payload(decode=True) raises AssertionError that "should never happen"
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: r.david.murray Nosy List: Claudiu Saftoiu, CryptidVulpes, ned.deily, r.david.murray, taleinat
Priority: high Keywords: patch

Created on 2016-06-27 04:52 by Claudiu Saftoiu, last changed 2018-06-12 13:47 by taleinat. This issue is now closed.

Files
File name Uploaded Description Edit
bugreport.py Claudiu Saftoiu, 2016-06-27 04:52 File to reproduce the bug
bugreport_moretests.py Claudiu Saftoiu, 2016-06-27 05:09 more test cases
issue27397_poc.py CryptidVulpes, 2016-07-28 13:22 Script that will reproduce the bug
issue27397_poc_minimal.py CryptidVulpes, 2016-07-28 13:23 Minimal script to reproduce the bug
Pull Requests
URL Status Linked Edit
PR 7583 merged taleinat, 2018-06-10 07:00
PR 7664 merged miss-islington, 2018-06-12 12:47
PR 7665 merged miss-islington, 2018-06-12 12:48
Messages (17)
msg269346 - (view) Author: Claudiu Saftoiu (Claudiu Saftoiu) Date: 2016-06-27 04:52
I'm processing Yahoo! Groups backup archives, and came across an email message which causes the `.get_payload(decode=True)` step to raise an AssertionError. Particularly, the following exception is raised in `lib/python3.5/email/_encoded_words.py`, line 124:

            # This should never happen.
            raise AssertionError("unexpected binascii.Error")

Attached is the file which, when run under Python 3.5.1, causes the exception to be raised.
msg269348 - (view) Author: Claudiu Saftoiu (Claudiu Saftoiu) Date: 2016-06-27 05:09
See attached another file with more test cases.
msg269375 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-27 13:20
This appears to be a bug in b64decode.  It should not be raising that error if validate=False, as it is in the code being executed during the test case.  It's not that b64decode is ignoring the validate argument; the error appears to be data-dependent.  I don't have time to investigate this further at the moment, perhaps someone else can pick it up.
msg271545 - (view) Author: Jami Lindh (CryptidVulpes) Date: 2016-07-28 13:22
I stumbled upon this bug as well while fuzzing with AFL. The curious thing is that email.message_from_string still accepts that garbled message as a valid email.
msg271546 - (view) Author: Jami Lindh (CryptidVulpes) Date: 2016-07-28 13:23
I also attached a minimal script containing only the decode call and the garbage payload.
msg271594 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-29 01:46
Thanks for narrowing this down.  There are two problems: the first is that the base64 module is raising "Incorrect padding" when the issue is actually that the input string cannot be decoded using the base64 algorithm no matter what padding might be supplied.  The second is that the else clause in _encoded_words *can* happen, in this circumstance.

The circumstance is that the input string contains n*4+1 characters.  No encoding of source text will ever produce that number of output characters.  (I also seem to have used an index one greater than needed for the loop; the maximum number of padding characters is 2.)

The interesting question is what we want to do for error recovery in the n*4+1 case.  Do we declare the part invalid and return an empty string?  Do we try chopping off the last character and see if we can decode it?  The latter is complicated by the possible presence of invalid alphabet characters, but it is what I lean toward.

I'm open to alternate suggestions.  And to a patch if someone wants to write one :)
msg271595 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-29 01:53
It might be better to add a character: that will produce garbage as the last byte, but that could serve as an alert that something is wrong...and we don't know if the extra character is at the end or at the beginning of the base64 encoded string, or if, as in this case, it is multiple extra characters which will produce trailing garbage anyway.  The example in hand is, I would guess, by far the most likely scenario.
msg271607 - (view) Author: Jami Lindh (CryptidVulpes) Date: 2016-07-29 05:55
In my opinion, this is still a clear case of "invalid input, raise an error", but I don't think AssertionError is the right one. Maybe just don't catch the unexpected binascii.Error and let it fly towards the user?

I might go even one step further and just let base64 library handle and raise all the necessary errors here, similarly to how your "encode_b" function works.

If you want more error resilience (meaning: email lib would not crash with invalid inputs), I would think returning an empty string is okay, since that base64 input is clearly not a valid base64 that is going to decode into anything.
msg271608 - (view) Author: Jami Lindh (CryptidVulpes) Date: 2016-07-29 05:57
Also, I'm not sure what is the development status in Python 3.4 but in my case this bug happened using the Debian Jessie Python version 3.4.2-2
msg271611 - (view) Author: Jami Lindh (CryptidVulpes) Date: 2016-07-29 06:33
It might also make sense to return the payload undecoded. The documentation for get_payload() function says:

"[...] the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded)."

Even though the header's value tries to convince you "base64" is the encoding, it is - in this case - either broken base64 or not. Hence it might fall into the category "some other encoding is used", justifying the "payload is returned as-is (undecoded)".

As to the original payload Claudiu posted, in that the mailserver has truncated the email. This already provides the user with non-base64 string that they try to convince you to decode as base64. My argument is still valid in this case.
msg271625 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-29 14:09
Returning the undecoded payload is a good idea.  Let's go with that.

The email module, unlike most stdlib packages, has a mandate that the parser should never raise an error.  Instead we do our best to guess (very unlike everything else in python!) and note 'defects' in the message.  The reason this is the case is Postel's Law, which has become one of the guiding principles in dealing with email over the years: "be conservative in what you do, and generous in what you accept".  So, the generator will raise errors (if the original input didn't come from the parser), but the parser will not, if at all possible.

(For those who want aggressive error checking, python3 the 'raise_on_defect' policy setting.)

For the versions, we use that field to indicate which versions the bug will get fixed in, which is why I removed 3.4.
msg318763 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-05 15:36
See #33770, which will make recognizing and handling this case straightforward.
msg319338 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-06-11 20:12
> See #33770, which will make recognizing and handling this case straightforward.

The fix for #33770 is only in 3.7.0 and later, so does this mean this change should not be backported to 3.6?  Updating the the versions accordingly.  If it is appropriate for 3.6, we can add that back in.
msg319364 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-12 05:47
I ended up fixing this independently of #33770.

AFAIK this should just work with 3.6.
msg319377 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-12 12:46
New changeset c3f55be7dd012b7e92901627d0b31c21e983ccb4 by Tal Einat in branch 'master':
bpo-27397: Make email module properly handle invalid-length base64 strings (#7583)
https://github.com/python/cpython/commit/c3f55be7dd012b7e92901627d0b31c21e983ccb4
msg319381 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-12 13:46
New changeset 7b82281c80d0064559866afe92f19cae5978c841 by Tal Einat (Miss Islington (bot)) in branch '3.7':
bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7664)
https://github.com/python/cpython/commit/7b82281c80d0064559866afe92f19cae5978c841
msg319382 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-12 13:46
New changeset 63dd1f70c1aa57494802c9b68f9db4918620fc8f by Tal Einat (Miss Islington (bot)) in branch '3.6':
bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7665)
https://github.com/python/cpython/commit/63dd1f70c1aa57494802c9b68f9db4918620fc8f
History
Date User Action Args
2018-06-12 13:47:42taleinatsetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.6
2018-06-12 13:46:29taleinatsetmessages: + msg319382
2018-06-12 13:46:14taleinatsetmessages: + msg319381
2018-06-12 12:48:33miss-islingtonsetpull_requests: + pull_request7280
2018-06-12 12:47:40miss-islingtonsetpull_requests: + pull_request7279
2018-06-12 12:46:24taleinatsetmessages: + msg319377
2018-06-12 05:47:13taleinatsetmessages: + msg319364
2018-06-11 20:12:38ned.deilysetnosy: + ned.deily

messages: + msg319338
versions: + Python 3.7, Python 3.8, - Python 3.5, Python 3.6
2018-06-10 07:00:27taleinatsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7205
2018-06-05 15:36:11taleinatsetnosy: + taleinat
messages: + msg318763
2016-07-29 14:09:15r.david.murraysetmessages: + msg271625
versions: - Python 3.4
2016-07-29 06:33:37CryptidVulpessetmessages: + msg271611
2016-07-29 05:57:14CryptidVulpessetmessages: + msg271608
versions: + Python 3.4
2016-07-29 05:55:18CryptidVulpessetmessages: + msg271607
2016-07-29 01:53:47r.david.murraysetmessages: + msg271595
2016-07-29 01:46:40r.david.murraysettype: crash -> behavior
stage: needs patch
messages: + msg271594
versions: - Python 3.4
2016-07-28 13:23:41CryptidVulpessetfiles: + issue27397_poc_minimal.py

messages: + msg271546
2016-07-28 13:22:50CryptidVulpessetfiles: + issue27397_poc.py
versions: + Python 3.4
nosy: + CryptidVulpes

messages: + msg271545
2016-06-27 13:20:06r.david.murraysetmessages: + msg269375
versions: + Python 3.6
2016-06-27 05:09:46Claudiu Saftoiusetfiles: + bugreport_moretests.py

messages: + msg269348
2016-06-27 04:58:54serhiy.storchakasetpriority: normal -> high
assignee: r.david.murray

nosy: + r.david.murray
2016-06-27 04:52:48Claudiu Saftoiucreate