classification
Title: Confusing base64.b64decode output
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: mark.dickinson, miss-islington, ned.deily, taleinat
Priority: normal Keywords: patch

Created on 2018-09-19 13:00 by mark.dickinson, last changed 2018-09-28 06:14 by taleinat. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9563 merged taleinat, 2018-09-25 13:43
PR 9614 merged miss-islington, 2018-09-28 05:57
Messages (9)
msg325757 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-19 13:00
The following error message in Python 3.7 is confusing and unhelpful:

    >>> s = "FS9qzW_oliGH_Yo="
    >>> base64.b64decode(s)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/local/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/base64.py", line 87, in b64decode
        return binascii.a2b_base64(s)
    binascii.Error: Invalid base64-encoded string: length cannot be 1 more than a multiple of 4
    >>> len(s) % 4
    0

This had me staring at the string s and scratching my head, thinking: "s doesn't have length 1 more than a multiple of 4, either with or without the "=" padding byte. What's this talking about?"

The actual user error here is a failure to use the "altchars" argument correctly:

>>> base64.b64decode(s, altchars="-_")
b'\x15/j\xcdo\xe8\x96!\x87\xfd\x8a'

The error message in versions prior to Python 3.7 is no more helpful, but it's not quite as misleading, either.

Python 3.6.6 (default, Jun 28 2018, 05:43:53) 
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import base64
>>> s = "FS9qzW_oliGH_Yo="
>>> base64.b64decode(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/base64.py", line 87, in b64decode
    return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Related: #1466065, #33770
msg325844 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-20 05:42
I welcome suggestions for improvement!

We could either find a better error message covering all cases, or emit a more specific message when non-base64 characters have been skipped.

Unfortunately, skipping non-base64 characters is a basic assumption of a2b_base64 which is used under the hood, and it currently provides no indication whether such characters are skipped, and if so how many.  Also, it is common in some contexts to have line breaks in base64-encoded data, which are skipped when decoding.
msg325845 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-20 06:22
Perhaps something such as "number of base64 data characters cannot be 1 more than a multiple of 4"?
msg326106 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-22 18:25
Yes, I'm having trouble thinking of an alternative message that's both accurate and helpful. I think what I _really_ want as a user is for b64decode to reject strings containing "_" and/or "-" as invalid (assuming altchars has been provided), but that would be a backwards-incompatible change, and has already been discussed in #1466065. Not sure it's worth revisiting that discussion.

Maybe just something like "invalid number of base64 characters"? We could even embed the actual number of base64 characters in the error message, which would give the user a clue that some characters are not being considered base64 characters.

I find the "1 more than a multiple of 4" wording a bit clunky, and potentially misleading.
msg326111 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-22 20:08
> I think what I _really_ want as a user is for b64decode to reject strings containing...

Do you mean you'd like to have this behavior by default?  One can already use validate=True to have invalid characters cause an exception.  I too find it surprising the False is the default, but changing this would be backwards incompatible.

> I find the "1 more than a multiple of 4" wording a bit clunky, and potentially misleading.

I chose that to avoid mentioning "modulu" or "remainder".  I find it straightforward and clear, though admittedly a bit long and clumsy.  I don't believe it is inherently misleading, though.

I like your idea of including the number of base64 characters in the error message.  I find the phrase "base64 characters" ambiguous, though.  I suggest: "Invalid base64-encoded string: number of data characters (13) cannot be 1 more than a multiple of 4"
msg326243 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2018-09-24 15:03
> Do you mean you'd like to have this behavior by default?

Ideally, yes, but I think that's out of scope for this bug report.

> I don't believe it is inherently misleading, though.

Yes, the "1 more than a multiple of 4" bit is clunky, but not misleading per se. It becomes misleading in combination with "length cannot be", since the obvious meaning of "length" here is the length of the string passed to the "b64decode" function, and that's not the correct interpretation in this context.

Your suggested rewording gets rid of the "length" reference, and looks like an improvement to me. "data characters" seems fairly clear, too.
msg326606 - (view) Author: miss-islington (miss-islington) Date: 2018-09-28 05:57
New changeset 1fba2ffc37da52c08db51fe4360459990b0311c9 by Miss Islington (bot) (Tal Einat) in branch 'master':
bpo-34736: improve error message for invalid length b64decode inputs (GH-9563)
https://github.com/python/cpython/commit/1fba2ffc37da52c08db51fe4360459990b0311c9
msg326608 - (view) Author: miss-islington (miss-islington) Date: 2018-09-28 06:12
New changeset 7e35081bc828291da5793db49ab45dee4fda5043 by Miss Islington (bot) in branch '3.7':
bpo-34736: improve error message for invalid length b64decode inputs (GH-9563)
https://github.com/python/cpython/commit/7e35081bc828291da5793db49ab45dee4fda5043
msg326609 - (view) Author: Tal Einat (taleinat) * (Python committer) Date: 2018-09-28 06:14
Thanks, Mark!
History
Date User Action Args
2018-09-28 06:14:40taleinatsetstatus: open -> closed
resolution: fixed
messages: + msg326609

stage: patch review -> resolved
2018-09-28 06:12:58miss-islingtonsetmessages: + msg326608
2018-09-28 05:57:36miss-islingtonsetpull_requests: + pull_request9014
2018-09-28 05:57:33miss-islingtonsetnosy: + miss-islington
messages: + msg326606
2018-09-25 13:43:16taleinatsetkeywords: + patch
stage: patch review
pull_requests: + pull_request8965
2018-09-24 15:03:03mark.dickinsonsetmessages: + msg326243
2018-09-22 20:08:14taleinatsetmessages: + msg326111
2018-09-22 18:25:20mark.dickinsonsetmessages: + msg326106
2018-09-22 17:28:08taleinatsetnosy: + ned.deily
2018-09-20 06:22:39taleinatsetmessages: + msg325845
2018-09-20 05:42:17taleinatsetmessages: + msg325844
versions: + Python 3.8
2018-09-19 13:58:18serhiy.storchakasetnosy: + taleinat
components: + Library (Lib)
2018-09-19 13:00:53mark.dickinsoncreate