msg269346 - (view) |
Author: Claudiu Saftoiu (Claudiu Saftoiu) |
Date: 2016-06-27 04:52 |
I'm processing Yahoo! Groups backup archives, and came across an email message which causes the `.get_payload(decode=True)` step to raise an AssertionError. Particularly, the following exception is raised in `lib/python3.5/email/_encoded_words.py`, line 124:
# This should never happen.
raise AssertionError("unexpected binascii.Error")
Attached is the file which, when run under Python 3.5.1, causes the exception to be raised.
|
msg269348 - (view) |
Author: Claudiu Saftoiu (Claudiu Saftoiu) |
Date: 2016-06-27 05:09 |
See attached another file with more test cases.
|
msg269375 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-06-27 13:20 |
This appears to be a bug in b64decode. It should not be raising that error if validate=False, as it is in the code being executed during the test case. It's not that b64decode is ignoring the validate argument; the error appears to be data-dependent. I don't have time to investigate this further at the moment, perhaps someone else can pick it up.
|
msg271545 - (view) |
Author: Jami Lindh (CryptidVulpes) |
Date: 2016-07-28 13:22 |
I stumbled upon this bug as well while fuzzing with AFL. The curious thing is that email.message_from_string still accepts that garbled message as a valid email.
|
msg271546 - (view) |
Author: Jami Lindh (CryptidVulpes) |
Date: 2016-07-28 13:23 |
I also attached a minimal script containing only the decode call and the garbage payload.
|
msg271594 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-29 01:46 |
Thanks for narrowing this down. There are two problems: the first is that the base64 module is raising "Incorrect padding" when the issue is actually that the input string cannot be decoded using the base64 algorithm no matter what padding might be supplied. The second is that the else clause in _encoded_words *can* happen, in this circumstance.
The circumstance is that the input string contains n*4+1 characters. No encoding of source text will ever produce that number of output characters. (I also seem to have used an index one greater than needed for the loop; the maximum number of padding characters is 2.)
The interesting question is what we want to do for error recovery in the n*4+1 case. Do we declare the part invalid and return an empty string? Do we try chopping off the last character and see if we can decode it? The latter is complicated by the possible presence of invalid alphabet characters, but it is what I lean toward.
I'm open to alternate suggestions. And to a patch if someone wants to write one :)
|
msg271595 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-29 01:53 |
It might be better to add a character: that will produce garbage as the last byte, but that could serve as an alert that something is wrong...and we don't know if the extra character is at the end or at the beginning of the base64 encoded string, or if, as in this case, it is multiple extra characters which will produce trailing garbage anyway. The example in hand is, I would guess, by far the most likely scenario.
|
msg271607 - (view) |
Author: Jami Lindh (CryptidVulpes) |
Date: 2016-07-29 05:55 |
In my opinion, this is still a clear case of "invalid input, raise an error", but I don't think AssertionError is the right one. Maybe just don't catch the unexpected binascii.Error and let it fly towards the user?
I might go even one step further and just let base64 library handle and raise all the necessary errors here, similarly to how your "encode_b" function works.
If you want more error resilience (meaning: email lib would not crash with invalid inputs), I would think returning an empty string is okay, since that base64 input is clearly not a valid base64 that is going to decode into anything.
|
msg271608 - (view) |
Author: Jami Lindh (CryptidVulpes) |
Date: 2016-07-29 05:57 |
Also, I'm not sure what is the development status in Python 3.4 but in my case this bug happened using the Debian Jessie Python version 3.4.2-2
|
msg271611 - (view) |
Author: Jami Lindh (CryptidVulpes) |
Date: 2016-07-29 06:33 |
It might also make sense to return the payload undecoded. The documentation for get_payload() function says:
"[...] the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded)."
Even though the header's value tries to convince you "base64" is the encoding, it is - in this case - either broken base64 or not. Hence it might fall into the category "some other encoding is used", justifying the "payload is returned as-is (undecoded)".
As to the original payload Claudiu posted, in that the mailserver has truncated the email. This already provides the user with non-base64 string that they try to convince you to decode as base64. My argument is still valid in this case.
|
msg271625 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2016-07-29 14:09 |
Returning the undecoded payload is a good idea. Let's go with that.
The email module, unlike most stdlib packages, has a mandate that the parser should never raise an error. Instead we do our best to guess (very unlike everything else in python!) and note 'defects' in the message. The reason this is the case is Postel's Law, which has become one of the guiding principles in dealing with email over the years: "be conservative in what you do, and generous in what you accept". So, the generator will raise errors (if the original input didn't come from the parser), but the parser will not, if at all possible.
(For those who want aggressive error checking, python3 the 'raise_on_defect' policy setting.)
For the versions, we use that field to indicate which versions the bug will get fixed in, which is why I removed 3.4.
|
msg318763 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-05 15:36 |
See #33770, which will make recognizing and handling this case straightforward.
|
msg319338 - (view) |
Author: Ned Deily (ned.deily) *  |
Date: 2018-06-11 20:12 |
> See #33770, which will make recognizing and handling this case straightforward.
The fix for #33770 is only in 3.7.0 and later, so does this mean this change should not be backported to 3.6? Updating the the versions accordingly. If it is appropriate for 3.6, we can add that back in.
|
msg319364 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-12 05:47 |
I ended up fixing this independently of #33770.
AFAIK this should just work with 3.6.
|
msg319377 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-12 12:46 |
New changeset c3f55be7dd012b7e92901627d0b31c21e983ccb4 by Tal Einat in branch 'master':
bpo-27397: Make email module properly handle invalid-length base64 strings (#7583)
https://github.com/python/cpython/commit/c3f55be7dd012b7e92901627d0b31c21e983ccb4
|
msg319381 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-12 13:46 |
New changeset 7b82281c80d0064559866afe92f19cae5978c841 by Tal Einat (Miss Islington (bot)) in branch '3.7':
bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7664)
https://github.com/python/cpython/commit/7b82281c80d0064559866afe92f19cae5978c841
|
msg319382 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2018-06-12 13:46 |
New changeset 63dd1f70c1aa57494802c9b68f9db4918620fc8f by Tal Einat (Miss Islington (bot)) in branch '3.6':
bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7665)
https://github.com/python/cpython/commit/63dd1f70c1aa57494802c9b68f9db4918620fc8f
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:33 | admin | set | github: 71584 |
2018-06-12 13:47:42 | taleinat | set | status: open -> closed stage: patch review -> resolved resolution: fixed versions:
+ Python 3.6 |
2018-06-12 13:46:29 | taleinat | set | messages:
+ msg319382 |
2018-06-12 13:46:14 | taleinat | set | messages:
+ msg319381 |
2018-06-12 12:48:33 | miss-islington | set | pull_requests:
+ pull_request7280 |
2018-06-12 12:47:40 | miss-islington | set | pull_requests:
+ pull_request7279 |
2018-06-12 12:46:24 | taleinat | set | messages:
+ msg319377 |
2018-06-12 05:47:13 | taleinat | set | messages:
+ msg319364 |
2018-06-11 20:12:38 | ned.deily | set | nosy:
+ ned.deily
messages:
+ msg319338 versions:
+ Python 3.7, Python 3.8, - Python 3.5, Python 3.6 |
2018-06-10 07:00:27 | taleinat | set | keywords:
+ patch stage: needs patch -> patch review pull_requests:
+ pull_request7205 |
2018-06-05 15:36:11 | taleinat | set | nosy:
+ taleinat messages:
+ msg318763
|
2016-07-29 14:09:15 | r.david.murray | set | messages:
+ msg271625 versions:
- Python 3.4 |
2016-07-29 06:33:37 | CryptidVulpes | set | messages:
+ msg271611 |
2016-07-29 05:57:14 | CryptidVulpes | set | messages:
+ msg271608 versions:
+ Python 3.4 |
2016-07-29 05:55:18 | CryptidVulpes | set | messages:
+ msg271607 |
2016-07-29 01:53:47 | r.david.murray | set | messages:
+ msg271595 |
2016-07-29 01:46:40 | r.david.murray | set | type: crash -> behavior stage: needs patch messages:
+ msg271594 versions:
- Python 3.4 |
2016-07-28 13:23:41 | CryptidVulpes | set | files:
+ issue27397_poc_minimal.py
messages:
+ msg271546 |
2016-07-28 13:22:50 | CryptidVulpes | set | files:
+ issue27397_poc.py versions:
+ Python 3.4 nosy:
+ CryptidVulpes
messages:
+ msg271545
|
2016-06-27 13:20:06 | r.david.murray | set | messages:
+ msg269375 versions:
+ Python 3.6 |
2016-06-27 05:09:46 | Claudiu Saftoiu | set | files:
+ bugreport_moretests.py
messages:
+ msg269348 |
2016-06-27 04:58:54 | serhiy.storchaka | set | priority: normal -> high assignee: r.david.murray
nosy:
+ r.david.murray |
2016-06-27 04:52:48 | Claudiu Saftoiu | create | |