b64decode should accept strings or bytes #49019

beazley · 2008-12-29T17:35:52Z

BPO	4769
Nosy	@vstinner, @merwok, @bitdancer
Files	b64-decode-str-bytes-typeerror.txt: running `python -m base64` with various options and inputs base64_str.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2010-10-27.12:16:16.685>
created_at = <Date 2008-12-29.17:35:52.476>
labels = ['type-feature', 'library']
title = 'b64decode should accept strings or bytes'
updated_at = <Date 2010-10-27.12:16:16.684>
user = 'https://bugs.python.org/beazley'

bugs.python.org fields:

activity = <Date 2010-10-27.12:16:16.684>
actor = 'eric.araujo'
assignee = 'none'
closed = True
closed_date = <Date 2010-10-27.12:16:16.685>
closer = 'eric.araujo'
components = ['Library (Lib)']
creation = <Date 2008-12-29.17:35:52.476>
creator = 'beazley'
dependencies = []
files = ['17412', '17463']
hgrepos = []
issue_num = 4769
keywords = ['patch']
message_count = 14.0
messages = ['78466', '78468', '78554', '78746', '78747', '106127', '106266', '106314', '106315', '106477', '106486', '106488', '106832', '115690']
nosy_count = 7.0
nosy_names = ['beazley', 'vstinner', 'kawai', 'eric.araujo', 'r.david.murray', 'brotchie', 'meatballhat']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue4769'
versions = ['Python 3.2']

The text was updated successfully, but these errors were encountered:

beazley · 2008-12-29T17:35:52Z

The whole point of base64 encoding is to safely encode binary data into
text characters. Thus, the base64.b64decode() function should equally
accept text strings or binary strings as input. For example, there is a
reasonable expectation that something like this should work:

>>> x = 'SGVsbG8='
>>> base64.b64decode(x)
b'Hello'
>>>

In Python 3, you get this exception however:

>>> base64.b64decode(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/lib/python3.0/base64.py", line 80, in b64decode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>>

I realize that there are encoding issues with Unicode strings, but
base64 encodes everything into the first 127 ASCII characters. If the
input to b64decode is a str, just do a encode('ascii') operation on it
and proceed. If that fails, it wasn't valid Base64 to begin with.

I can't think of any real negative impact to making this change as long
as the result is still always bytes. The main benefit is just
simplifying the decoding process for end-users.

See bpo-4768.

beazley · 2008-12-29T17:39:55Z

Note: This problem applies to all of the other decoders/encoders in the
base64 too (b16, b32, etc.)

beazley · 2008-12-30T18:30:32Z

One more followup. The quopri module (which is highly related to
base64 in that quopri and base64 are often used together within MIME)
does accept both unicode and byte strings when decoding. For example,
this works:

>>> quopri.decodestring('Hello World')
b'Hello World'
>>> quopri.decodestring(b'Hello World')
b'Hello World'
>>>

However, the quopri module, like base64, uses byte strings almost
everywhere else. For example, encoding a byte string with quopri still
produces bytes (just like base64)

>>> quopri.encodestring(b'Hello World')
b'Hello World'
>>>

vstinner · 2009-01-02T01:31:56Z

About quoted printable, there are two implementations:

binascii.a2b_qp() (Modules/binascii.c): C implementation, use
PyArg_ParseTupleAndKeywords(args, kwargs, "s*|i", ...) to parse the
data
quopri.decode() (Lib/quopri.py): Python implementation
=> quopri.decodestring() uses io.BytesIO() to parse the data

But quopri.decodestring() reuses binascii.a2b_qp() if the binascii
module is present. So quopri.decodestring behaviour depends of the
presence of binascii module...

binascii present: accept bytes or unicode
missing binascii: accept only bytes!

binascii.a2b_qp() encodes unicode string to UTF-8 bytes string.

vstinner · 2009-01-02T02:10:05Z

If the input to b64decode is a str, just do a encode('ascii')
operation on it and proceed. If that fails, it wasn't valid
Base64 to begin with.

On unicode encode error, should we raise an UnicodeEncodeError or a
binascii.Error?

And there is also the problem of base64.b64decode()
alternate "characters". Should we accept non-ASCII alternate
characters?
base64.b64decode('01a\xfeb\xffcd', altchars=b'\xfe\xff')

For the example, the result depends on the choosen charset:

ASCII (strict): encode input text raise an UnicodeDecodeError
ISO-8859-1 (ignore): works as expected
UTF-8 (strict): unexpected result

The only valid choice is ASCII because ISO-8859-1 or UTF-8 will
reintroduce bytes/character mixture.

meatballhat · 2010-05-20T02:35:10Z

This appears to still be an issue in py3k. I've attached the command and output when running python3 -m base64 with various options and inputs. If there's consensus on a solution, I'd be happy to take a crack at making a patch.

vstinner · 2010-05-21T21:30:52Z

Attached base64_main.patch fixes errors described in b64-decode-str-bytes-typeerror.txt.

meatballhat · 2010-05-22T16:31:03Z

@Haypo - what patch? :)

vstinner · 2010-05-22T17:03:04Z

This one!

vstinner · 2010-05-25T21:33:19Z

I commited base64_main.patch (+ tests): 3.2 (r81533) and 3.1 (r81534).

vstinner · 2010-05-25T22:08:54Z

Accept unicode string is not "pure", but I agree that it's convinient. Here is a patch:

base64.b(16|32|64)encode and b64.encodebytes accept unicode string
unicode is first encoded to utf-8 to get a byte string
Update the docstrings and the documentation
Fix tests

vstinner · 2010-05-25T22:19:54Z

I commited base64_main.patch (+ tests): 3.2 (r81533) and 3.1 (r81534).

Hum, the test fails on Windows: fixed by r81535 (3.2) and r81536 (3.1).

bitdancer · 2010-06-01T12:18:42Z

The patch appears to be fixing the wrong functions. It is decode that needs to accept unicode. Encode should still be restricted to bytes.

bitdancer · 2010-09-06T03:40:40Z

After thinking about it, I'm inclined to reject this and say that quopri should be fixed to reject string input to decode. On python-dev Guido opined that a kind of polymorphism in the stdlib was good (bytes in --> bytes out, string in --> string out). string in --> bytes out and bytes in --> string out was considered bad, to my understanding (except for unicode encode/decode, of course).

As you say, all one has to do is encode the string as ascii to get the bytes to pass in. It is better, I think, to maintain the clear distinction between bytes and strings in the programmers mind. That's what Python3 is all about, really.

As for "the whole point of base64 is to safely encode binary data into text characters", that is not true. The point is to encode binary data into a subset of *ascii*, which is *not* text, it is bytes. The fact that this is also useful for transferring binary data through unicode is pretty much an unintended consequence of the way unicode is designed.

beazley mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Dec 29, 2008

bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Sep 6, 2010

merwok closed this as completed Oct 27, 2010

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

b64decode should accept strings or bytes #49019

b64decode should accept strings or bytes #49019

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 30, 2008

vstinner commented Jan 2, 2009

vstinner commented Jan 2, 2009

meatballhat mannequin commented May 20, 2010

vstinner commented May 21, 2010

meatballhat mannequin commented May 22, 2010

vstinner commented May 22, 2010

vstinner commented May 25, 2010

vstinner commented May 25, 2010

vstinner commented May 25, 2010

bitdancer commented Jun 1, 2010

bitdancer commented Sep 6, 2010

b64decode should accept strings or bytes #49019

b64decode should accept strings or bytes #49019

Comments

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 29, 2008

beazley mannequin commented Dec 30, 2008

vstinner commented Jan 2, 2009

vstinner commented Jan 2, 2009

meatballhat mannequin commented May 20, 2010

vstinner commented May 21, 2010

meatballhat mannequin commented May 22, 2010

vstinner commented May 22, 2010

vstinner commented May 25, 2010

vstinner commented May 25, 2010

vstinner commented May 25, 2010

bitdancer commented Jun 1, 2010

bitdancer commented Sep 6, 2010