Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.message.Message.get_payload(decode=True) raises AssertionError that "should never happen" #71584

Closed
ClaudiuSaftoiu mannequin opened this issue Jun 27, 2016 · 17 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ClaudiuSaftoiu
Copy link
Mannequin

ClaudiuSaftoiu mannequin commented Jun 27, 2016

BPO 27397
Nosy @taleinat, @ned-deily, @bitdancer
PRs
  • bpo-27397: Make email module properly handle invalid-length base64 strings #7583
  • [3.7] bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) #7664
  • [3.6] bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) #7665
  • Files
  • bugreport.py: File to reproduce the bug
  • bugreport_moretests.py: more test cases
  • issue27397_poc.py: Script that will reproduce the bug
  • issue27397_poc_minimal.py: Minimal script to reproduce the bug
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/bitdancer'
    closed_at = <Date 2018-06-12.13:47:42.399>
    created_at = <Date 2016-06-27.04:52:48.741>
    labels = ['3.7', '3.8', 'type-bug', 'library']
    title = 'email.message.Message.get_payload(decode=True) raises AssertionError that "should never happen"'
    updated_at = <Date 2018-06-12.13:47:42.397>
    user = 'https://bugs.python.org/ClaudiuSaftoiu'

    bugs.python.org fields:

    activity = <Date 2018-06-12.13:47:42.397>
    actor = 'taleinat'
    assignee = 'r.david.murray'
    closed = True
    closed_date = <Date 2018-06-12.13:47:42.399>
    closer = 'taleinat'
    components = ['Library (Lib)']
    creation = <Date 2016-06-27.04:52:48.741>
    creator = 'Claudiu Saftoiu'
    dependencies = []
    files = ['43556', '43557', '43921', '43922']
    hgrepos = []
    issue_num = 27397
    keywords = ['patch']
    message_count = 17.0
    messages = ['269346', '269348', '269375', '271545', '271546', '271594', '271595', '271607', '271608', '271611', '271625', '318763', '319338', '319364', '319377', '319381', '319382']
    nosy_count = 5.0
    nosy_names = ['taleinat', 'ned.deily', 'r.david.murray', 'Claudiu Saftoiu', 'CryptidVulpes']
    pr_nums = ['7583', '7664', '7665']
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue27397'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @ClaudiuSaftoiu
    Copy link
    Mannequin Author

    ClaudiuSaftoiu mannequin commented Jun 27, 2016

    I'm processing Yahoo! Groups backup archives, and came across an email message which causes the .get_payload(decode=True) step to raise an AssertionError. Particularly, the following exception is raised in lib/python3.5/email/_encoded_words.py, line 124:

            # This should never happen.
            raise AssertionError("unexpected binascii.Error")
    

    Attached is the file which, when run under Python 3.5.1, causes the exception to be raised.

    @ClaudiuSaftoiu ClaudiuSaftoiu mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Jun 27, 2016
    @ClaudiuSaftoiu
    Copy link
    Mannequin Author

    ClaudiuSaftoiu mannequin commented Jun 27, 2016

    See attached another file with more test cases.

    @bitdancer
    Copy link
    Member

    This appears to be a bug in b64decode. It should not be raising that error if validate=False, as it is in the code being executed during the test case. It's not that b64decode is ignoring the validate argument; the error appears to be data-dependent. I don't have time to investigate this further at the moment, perhaps someone else can pick it up.

    @CryptidVulpes
    Copy link
    Mannequin

    CryptidVulpes mannequin commented Jul 28, 2016

    I stumbled upon this bug as well while fuzzing with AFL. The curious thing is that email.message_from_string still accepts that garbled message as a valid email.

    @CryptidVulpes
    Copy link
    Mannequin

    CryptidVulpes mannequin commented Jul 28, 2016

    I also attached a minimal script containing only the decode call and the garbage payload.

    @bitdancer
    Copy link
    Member

    Thanks for narrowing this down. There are two problems: the first is that the base64 module is raising "Incorrect padding" when the issue is actually that the input string cannot be decoded using the base64 algorithm no matter what padding might be supplied. The second is that the else clause in _encoded_words *can* happen, in this circumstance.

    The circumstance is that the input string contains n*4+1 characters. No encoding of source text will ever produce that number of output characters. (I also seem to have used an index one greater than needed for the loop; the maximum number of padding characters is 2.)

    The interesting question is what we want to do for error recovery in the n*4+1 case. Do we declare the part invalid and return an empty string? Do we try chopping off the last character and see if we can decode it? The latter is complicated by the possible presence of invalid alphabet characters, but it is what I lean toward.

    I'm open to alternate suggestions. And to a patch if someone wants to write one :)

    @bitdancer bitdancer added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Jul 29, 2016
    @bitdancer
    Copy link
    Member

    It might be better to add a character: that will produce garbage as the last byte, but that could serve as an alert that something is wrong...and we don't know if the extra character is at the end or at the beginning of the base64 encoded string, or if, as in this case, it is multiple extra characters which will produce trailing garbage anyway. The example in hand is, I would guess, by far the most likely scenario.

    @CryptidVulpes
    Copy link
    Mannequin

    CryptidVulpes mannequin commented Jul 29, 2016

    In my opinion, this is still a clear case of "invalid input, raise an error", but I don't think AssertionError is the right one. Maybe just don't catch the unexpected binascii.Error and let it fly towards the user?

    I might go even one step further and just let base64 library handle and raise all the necessary errors here, similarly to how your "encode_b" function works.

    If you want more error resilience (meaning: email lib would not crash with invalid inputs), I would think returning an empty string is okay, since that base64 input is clearly not a valid base64 that is going to decode into anything.

    @CryptidVulpes
    Copy link
    Mannequin

    CryptidVulpes mannequin commented Jul 29, 2016

    Also, I'm not sure what is the development status in Python 3.4 but in my case this bug happened using the Debian Jessie Python version 3.4.2-2

    @CryptidVulpes
    Copy link
    Mannequin

    CryptidVulpes mannequin commented Jul 29, 2016

    It might also make sense to return the payload undecoded. The documentation for get_payload() function says:

    "[...] the payload will be decoded if this header’s value is quoted-printable or base64. If some other encoding is used, or Content-Transfer-Encoding header is missing, the payload is returned as-is (undecoded)."

    Even though the header's value tries to convince you "base64" is the encoding, it is - in this case - either broken base64 or not. Hence it might fall into the category "some other encoding is used", justifying the "payload is returned as-is (undecoded)".

    As to the original payload Claudiu posted, in that the mailserver has truncated the email. This already provides the user with non-base64 string that they try to convince you to decode as base64. My argument is still valid in this case.

    @bitdancer
    Copy link
    Member

    Returning the undecoded payload is a good idea. Let's go with that.

    The email module, unlike most stdlib packages, has a mandate that the parser should never raise an error. Instead we do our best to guess (very unlike everything else in python!) and note 'defects' in the message. The reason this is the case is Postel's Law, which has become one of the guiding principles in dealing with email over the years: "be conservative in what you do, and generous in what you accept". So, the generator will raise errors (if the original input didn't come from the parser), but the parser will not, if at all possible.

    (For those who want aggressive error checking, python3 the 'raise_on_defect' policy setting.)

    For the versions, we use that field to indicate which versions the bug will get fixed in, which is why I removed 3.4.

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 5, 2018

    See bpo-33770, which will make recognizing and handling this case straightforward.

    @ned-deily
    Copy link
    Member

    See bpo-33770, which will make recognizing and handling this case straightforward.

    The fix for bpo-33770 is only in 3.7.0 and later, so does this mean this change should not be backported to 3.6? Updating the the versions accordingly. If it is appropriate for 3.6, we can add that back in.

    @ned-deily ned-deily added 3.7 (EOL) end of life 3.8 only security fixes labels Jun 11, 2018
    @taleinat
    Copy link
    Contributor

    I ended up fixing this independently of bpo-33770.

    AFAIK this should just work with 3.6.

    @taleinat
    Copy link
    Contributor

    New changeset c3f55be by Tal Einat in branch 'master':
    bpo-27397: Make email module properly handle invalid-length base64 strings (bpo-7583)
    c3f55be

    @taleinat
    Copy link
    Contributor

    New changeset 7b82281 by Tal Einat (Miss Islington (bot)) in branch '3.7':
    bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7664)
    7b82281

    @taleinat
    Copy link
    Contributor

    New changeset 63dd1f7 by Tal Einat (Miss Islington (bot)) in branch '3.6':
    bpo-27397: Make email module properly handle invalid-length base64 strings (GH-7583) (GH-7665)
    63dd1f7

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants