classification
Title: base64 throws 'incorrect padding' exception when the issue is NOT with the padding
Type: behavior Stage: resolved
Components: Extension Modules Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: dniq, mark.dickinson, ned.deily, r.david.murray, serhiy.storchaka, taleinat, terry.reedy
Priority: normal Keywords: patch

Created on 2018-06-04 19:53 by dniq, last changed 2018-06-11 05:31 by ned.deily. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 7416 merged taleinat, 2018-06-05 08:36
PR 7602 merged miss-islington, 2018-06-10 21:15
Messages (23)
msg318691 - (view) Author: Dmitry (dniq) Date: 2018-06-04 19:53
All base64 decoding methods fail to decode a valid base64 string, throwing 'incorrect padding' regardless of the string padding.

Here's an example:

>>> base64.urlsafe_b64decode('AQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ===')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/export/apps/python/3.6/lib/python3.6/base64.py", line 133, in urlsafe_b64decode
    return b64decode(s)
  File "/export/apps/python/3.6/lib/python3.6/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

The same string gets decoded without any issues using Perl's MIME::Base64 module or Java. So far Python's base64 module is the only one that fails to decode it.
msg318697 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-06-04 20:29
This doesn't look like a valid base64 string to me: the padding (if present) at the end of the string should be either "=" or "==", never "===".

Is the length of the decoded string equal to 58? If so, what's the last character of that decoded string? Whatever it is, it should end up being encoded as "xx==" (for some values of "x"), not as "Q===".
msg318701 - (view) Author: Dmitry (dniq) Date: 2018-06-04 20:45
It doesn’t matter how many “=“ characters are appended - it’s always throwing the same exception :(
msg318704 - (view) Author: Dmitry (dniq) Date: 2018-06-04 21:04
Apologies: apparently this particular string was given with one character missing in the beginning - should be "AAQAAQDhAAMAAQAAAAAAAAthAAAAJDczODFmZDM2LTNiOTYtNDVmYS04MjQ2LWRkYzJkMmViYjQ2YQ".

In this case, the problem is the exception itself: it's not an "incorrect padding" issue.
msg318716 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-04 22:15
I always assumed that any string composed of valid base64 characters could be decoded to *something* if you added some padding characters, but apparently that was an invalid assumption.  I agree that the message is incorrect for this case.
msg318721 - (view) Author: Dmitry (dniq) Date: 2018-06-04 22:45
Ideally there needs to be an option to ignore non-fatal errors. Indeed, pretty much any string should be base64-decodeable. Take a look at Perl's MIME::Base64 - it can decode pretty much any random string (so long as it only contains valid base64 characters).
msg318727 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-05 08:41
A base64-encoded string's length can only have a remainder of 0, 2 or 3, but never 1. This is why the padding you get when encoding can only be '=' or '==' but never '==='.
msg318728 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-05 08:43
Oops: when I wrote "length can only have a remainder of ..." I meant remainder upon division by 4.
msg318729 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-06-05 08:55
I'm not sure that "Invalid length" is better than "Invalid padding". Doesn't it mean the same?
msg318733 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-05 10:43
They're not the same.

When the encoded string's length modulu 4 is 2 or 3, there just need to be (at least) 2 or 1 padding characters ('=') for decoding to be successful, due to our decoder being rather strict. Less strict decoders may ignore the missing padding and successfully decode the encoded string.

When the remainder is 0, no padding is needed and everything is fine.

When the remainder is 1, the encoded string is simply invalid. It is not a padding issue. There is no valid way to decode the encoded string.
msg318752 - (view) Author: Dmitry (dniq) Date: 2018-06-05 13:07
I think something like “invalid length” message does make more sense in this case than the misleading “incorrect padding”.

It would also be great if there was a way to force it to try its best at decoding the string. :)
msg318755 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-05 13:30
I agree, but that would be a separate enhancement PR.  The email library would use that capability if it existed...currently it adds padding in a loop trying to do the decode if it gets the invalid padding message.  Which of course is going to fail for this case.
msg318762 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-05 15:00
Would an exception message as following be acceptable? "Invalid base64-encoded string: length cannot be 1 more than a multiple of 4"
msg318789 - (view) Author: Dmitry (dniq) Date: 2018-06-05 23:12
@taleinat - yes, that does look much better!
msg319093 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-06-08 18:52
Is the new message a clarification, in which case we would typically not backport, or a correction, which we usually would?
msg319094 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-08 18:58
It is more a correction than a clarification.

After looking through the module some more, I'm considering using binascii.Incomplete for this case, since it seems appropriate enough that it's worth using it rather than adding another, separate exception.
msg319101 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-06-08 19:50
I think that would move it even more into the realm of a bugfix, then, since code that cared about specific binascii exceptions could be looking for that one already.
msg319102 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-08 19:55
Code using only a2b_base64() would likely not be expecting this exception.

We have at least one such example in the stdlib: decode_b() in Lib/email/_encoded_words.py (which needs to be fixed regardless; see #27397).
msg319104 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-08 20:09
PR is ready with an improved exception message and raising binascii.Incomplete rather than binascii.Error. A final review would be welcome!
msg319202 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-10 07:01
New changeset 1b85c71a2136d3fa6a1da05b27b1fe4e4b8ee45e by Tal Einat in branch 'master':
bpo-33770: improve base64 exception message for encoded inputs of invalid length (#7416)
https://github.com/python/cpython/commit/1b85c71a2136d3fa6a1da05b27b1fe4e4b8ee45e
msg319248 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-06-10 21:37
New changeset 053d6c5ce246e6ba9c046467b02a0b6ba4abb8bf by Ned Deily (Miss Islington (bot)) in branch '3.7':
bpo-33770: improve base64 exception message for encoded inputs of invalid length (GH-7416) (GH-7602)
https://github.com/python/cpython/commit/053d6c5ce246e6ba9c046467b02a0b6ba4abb8bf
msg319249 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-06-10 21:43
I backported Tal's fix for 3.7.0rc1.  I am less certain about backporting to 3.6 and 2.7 at this stage of their lives but I don't have a strong feeling about it so I'll leave the issue open for that.
msg319272 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-06-11 05:26
The change is not entirely backward-compatible, so not back-porting before 3.7 seems good to me. IMO this should be closed.
History
Date User Action Args
2018-06-11 05:31:13ned.deilysetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: - Python 2.7, Python 3.6
2018-06-11 05:26:01taleinatsetmessages: + msg319272
2018-06-10 21:43:05ned.deilysetmessages: + msg319249
2018-06-10 21:37:16ned.deilysetnosy: + ned.deily
messages: + msg319248
2018-06-10 21:15:46miss-islingtonsetpull_requests: + pull_request7224
2018-06-10 07:01:53taleinatsetmessages: + msg319202
2018-06-08 20:09:47taleinatsetmessages: + msg319104
2018-06-08 19:55:57taleinatsetmessages: + msg319102
2018-06-08 19:50:54r.david.murraysetmessages: + msg319101
2018-06-08 18:58:37taleinatsetmessages: + msg319094
2018-06-08 18:52:08terry.reedysetnosy: + terry.reedy
messages: + msg319093
2018-06-05 23:12:32dniqsetmessages: + msg318789
2018-06-05 15:00:43taleinatsetmessages: + msg318762
2018-06-05 13:30:49r.david.murraysetmessages: + msg318755
2018-06-05 13:07:28dniqsetmessages: + msg318752
2018-06-05 10:43:09taleinatsetmessages: + msg318733
2018-06-05 08:55:30serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg318729
2018-06-05 08:43:31taleinatsetmessages: + msg318728
2018-06-05 08:41:52taleinatsetnosy: + taleinat
messages: + msg318727
2018-06-05 08:36:40taleinatsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request7040
2018-06-04 22:46:13dniqsettitle: base64 throws 'incorrect padding' exception even though the string length is a multiple of 4 -> base64 throws 'incorrect padding' exception when the issue is NOT with the padding
2018-06-04 22:45:21dniqsetmessages: + msg318721
2018-06-04 22:15:34r.david.murraysetversions: + Python 2.7, Python 3.7, Python 3.8
nosy: + r.david.murray

messages: + msg318716

type: behavior
stage: needs patch
2018-06-04 21:04:58dniqsetmessages: + msg318704
2018-06-04 20:45:03dniqsetmessages: + msg318701
2018-06-04 20:29:52mark.dickinsonsetnosy: + mark.dickinson
messages: + msg318697
2018-06-04 19:53:31dniqcreate