Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 throws 'incorrect padding' exception when the issue is NOT with the padding #77951

Closed
dniq mannequin opened this issue Jun 4, 2018 · 23 comments
Closed

base64 throws 'incorrect padding' exception when the issue is NOT with the padding #77951

dniq mannequin opened this issue Jun 4, 2018 · 23 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@dniq
Copy link
Mannequin

dniq mannequin commented Jun 4, 2018

BPO 33770
Nosy @terryjreedy, @mdickinson, @taleinat, @ned-deily, @bitdancer, @serhiy-storchaka, @dniq
PRs
  • bpo-33770: improve base64 exception message for encoded inputs of invalid length #7416
  • [3.7] bpo-33770: improve base64 exception message for encoded inputs of invalid length (GH-7416) #7602
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-11.05:31:13.374>
    created_at = <Date 2018-06-04.19:53:31.414>
    labels = ['extension-modules', '3.8', 'type-bug', '3.7']
    title = "base64 throws 'incorrect padding' exception when the issue is NOT with the padding"
    updated_at = <Date 2018-06-11.05:31:13.373>
    user = 'https://github.com/dniq'

    bugs.python.org fields:

    activity = <Date 2018-06-11.05:31:13.373>
    actor = 'ned.deily'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-06-11.05:31:13.374>
    closer = 'ned.deily'
    components = ['Extension Modules']
    creation = <Date 2018-06-04.19:53:31.414>
    creator = 'dniq'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 33770
    keywords = ['patch']
    message_count = 23.0
    messages = ['318691', '318697', '318701', '318704', '318716', '318721', '318727', '318728', '318729', '318733', '318752', '318755', '318762', '318789', '319093', '319094', '319101', '319102', '319104', '319202', '319248', '319249', '319272']
    nosy_count = 7.0
    nosy_names = ['terry.reedy', 'mark.dickinson', 'taleinat', 'ned.deily', 'r.david.murray', 'serhiy.storchaka', 'dniq']
    pr_nums = ['7416', '7602']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue33770'
    versions = ['Python 3.7', 'Python 3.8']

    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 4, 2018

    All base64 decoding methods fail to decode a valid base64 string, throwing 'incorrect padding' regardless of the string padding.

    Here's an example:

    >>> base64.urlsafe_b64decode('AQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ===')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/export/apps/python/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode
        return b64decode(s)
      File "/export/apps/python/3.6/lib/python3.6/base64.py", line 87, in b64decode
        return binascii.a2b_base64(s)
    binascii.Error: Incorrect padding

    The same string gets decoded without any issues using Perl's MIME::Base64 module or Java. So far Python's base64 module is the only one that fails to decode it.

    @dniq dniq mannequin added the extension-modules C modules in the Modules dir label Jun 4, 2018
    @mdickinson
    Copy link
    Member

    This doesn't look like a valid base64 string to me: the padding (if present) at the end of the string should be either "=" or "==", never "===".

    Is the length of the decoded string equal to 58? If so, what's the last character of that decoded string? Whatever it is, it should end up being encoded as "xx==" (for some values of "x"), not as "Q===".

    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 4, 2018

    It doesn’t matter how many “=“ characters are appended - it’s always throwing the same exception :(

    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 4, 2018

    Apologies: apparently this particular string was given with one character missing in the beginning - should be "AAQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ".

    In this case, the problem is the exception itself: it's not an "incorrect padding" issue.

    @bitdancer
    Copy link
    Member

    I always assumed that any string composed of valid base64 characters could be decoded to *something* if you added some padding characters, but apparently that was an invalid assumption. I agree that the message is incorrect for this case.

    @bitdancer bitdancer added 3.7 (EOL) end of life 3.8 only security fixes type-bug An unexpected behavior, bug, or error labels Jun 4, 2018
    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 4, 2018

    Ideally there needs to be an option to ignore non-fatal errors. Indeed, pretty much any string should be base64-decodeable. Take a look at Perl's MIME::Base64 - it can decode pretty much any random string (so long as it only contains valid base64 characters).

    @dniq dniq mannequin changed the title base64 throws 'incorrect padding' exception even though the string length is a multiple of 4 base64 throws 'incorrect padding' exception when the issue is NOT with the padding Jun 4, 2018
    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 5, 2018

    A base64-encoded string's length can only have a remainder of 0, 2 or 3, but never 1. This is why the padding you get when encoding can only be '=' or '==' but never '==='.

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 5, 2018

    Oops: when I wrote "length can only have a remainder of ..." I meant remainder upon division by 4.

    @serhiy-storchaka
    Copy link
    Member

    I'm not sure that "Invalid length" is better than "Invalid padding". Doesn't it mean the same?

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 5, 2018

    They're not the same.

    When the encoded string's length modulu 4 is 2 or 3, there just need to be (at least) 2 or 1 padding characters ('=') for decoding to be successful, due to our decoder being rather strict. Less strict decoders may ignore the missing padding and successfully decode the encoded string.

    When the remainder is 0, no padding is needed and everything is fine.

    When the remainder is 1, the encoded string is simply invalid. It is not a padding issue. There is no valid way to decode the encoded string.

    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 5, 2018

    I think something like “invalid length” message does make more sense in this case than the misleading “incorrect padding”.

    It would also be great if there was a way to force it to try its best at decoding the string. :)

    @bitdancer
    Copy link
    Member

    I agree, but that would be a separate enhancement PR. The email library would use that capability if it existed...currently it adds padding in a loop trying to do the decode if it gets the invalid padding message. Which of course is going to fail for this case.

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 5, 2018

    Would an exception message as following be acceptable? "Invalid base64-encoded string: length cannot be 1 more than a multiple of 4"

    @dniq
    Copy link
    Mannequin Author

    dniq mannequin commented Jun 5, 2018

    @taleinat - yes, that does look much better!

    @terryjreedy
    Copy link
    Member

    Is the new message a clarification, in which case we would typically not backport, or a correction, which we usually would?

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 8, 2018

    It is more a correction than a clarification.

    After looking through the module some more, I'm considering using binascii.Incomplete for this case, since it seems appropriate enough that it's worth using it rather than adding another, separate exception.

    @bitdancer
    Copy link
    Member

    I think that would move it even more into the realm of a bugfix, then, since code that cared about specific binascii exceptions could be looking for that one already.

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 8, 2018

    Code using only a2b_base64() would likely not be expecting this exception.

    We have at least one such example in the stdlib: decode_b() in Lib/email/_encoded_words.py (which needs to be fixed regardless; see bpo-27397).

    @taleinat
    Copy link
    Contributor

    taleinat commented Jun 8, 2018

    PR is ready with an improved exception message and raising binascii.Incomplete rather than binascii.Error. A final review would be welcome!

    @taleinat
    Copy link
    Contributor

    New changeset 1b85c71 by Tal Einat in branch 'master':
    bpo-33770: improve base64 exception message for encoded inputs of invalid length (bpo-7416)
    1b85c71

    @ned-deily
    Copy link
    Member

    New changeset 053d6c5 by Ned Deily (Miss Islington (bot)) in branch '3.7':
    bpo-33770: improve base64 exception message for encoded inputs of invalid length (GH-7416) (GH-7602)
    053d6c5

    @ned-deily
    Copy link
    Member

    I backported Tal's fix for 3.7.0rc1. I am less certain about backporting to 3.6 and 2.7 at this stage of their lives but I don't have a strong feeling about it so I'll leave the issue open for that.

    @taleinat
    Copy link
    Contributor

    The change is not entirely backward-compatible, so not back-porting before 3.7 seems good to me. IMO this should be closed.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants