classification
Title: b64decode should accept strings or bytes
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: beazley, brotchie, eric.araujo, kawai, meatballhat, r.david.murray, vstinner
Priority: normal Keywords: patch

Created on 2008-12-29 17:35 by beazley, last changed 2010-10-27 12:16 by eric.araujo. This issue is now closed.

Files
File name Uploaded Description Edit
b64-decode-str-bytes-typeerror.txt meatballhat, 2010-05-20 02:35 running ``python -m base64`` with various options and inputs
base64_str.patch vstinner, 2010-05-25 22:08 review
Messages (14)
msg78466 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 17:35
The whole point of base64 encoding is to safely encode binary data into 
text characters.  Thus, the base64.b64decode() function should equally 
accept text strings or binary strings as input. For example, there is a 
reasonable expectation that something like this should work:

>>> x = 'SGVsbG8='
>>> base64.b64decode(x)
b'Hello'
>>>

In Python 3, you get this exception however:

>>> base64.b64decode(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/lib/python3.0/base64.py", line 80, in b64decode
    raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> 

I realize that there are encoding issues with Unicode strings, but 
base64 encodes everything into the first 127 ASCII characters.  If the 
input to b64decode is a str, just do a encode('ascii') operation on it 
and proceed.  If that fails, it wasn't valid Base64 to begin with.

I can't think of any real negative impact to making this change as long 
as the result is still always bytes.   The main benefit is just 
simplifying the decoding process for end-users.

See issue 4768.
msg78468 - (view) Author: David M. Beazley (beazley) Date: 2008-12-29 17:39
Note: This problem applies to all of the other decoders/encoders in the 
base64 too (b16, b32, etc.)
msg78554 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 18:30
One more followup.   The quopri module (which is highly related to 
base64 in that quopri and base64 are often used together within MIME) 
does accept both unicode and byte strings when decoding.  For example, 
this works:

>>> quopri.decodestring('Hello World')
b'Hello World'
>>> quopri.decodestring(b'Hello World')
b'Hello World'
>>>

However, the quopri module, like base64, uses byte strings almost 
everywhere else.  For example, encoding a byte string with quopri still 
produces bytes (just like base64)

>>> quopri.encodestring(b'Hello World')
b'Hello World'
>>>
msg78746 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-02 01:31
About quoted printable, there are two implementations:
 - binascii.a2b_qp() (Modules/binascii.c): C implementation, use 
PyArg_ParseTupleAndKeywords(args, kwargs, "s*|i", ...) to parse the 
data
 - quopri.decode() (Lib/quopri.py): Python implementation
   => quopri.decodestring() uses io.BytesIO() to parse the data

But quopri.decodestring() reuses binascii.a2b_qp() if the binascii 
module is present. So quopri.decodestring behaviour depends of the 
presence of binascii module...
 - binascii present: accept bytes or unicode
 - missing binascii: accept only bytes!

binascii.a2b_qp() encodes unicode string to UTF-8 bytes string.
msg78747 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-02 02:10
> If the input to b64decode is a str, just do a encode('ascii')
> operation on it and proceed.  If that fails, it wasn't valid 
> Base64 to begin with.

On unicode encode error, should we raise an UnicodeEncodeError or a 
binascii.Error?

And there is also the problem of base64.b64decode() 
alternate "characters". Should we accept non-ASCII alternate 
characters?
   base64.b64decode('01a\xfeb\xffcd', altchars=b'\xfe\xff')

For the example, the result depends on the choosen charset:
 - ASCII (strict): encode input text raise an UnicodeDecodeError
 - ISO-8859-1 (ignore): works as expected
 - UTF-8 (strict): unexpected result

The only valid choice is ASCII because ISO-8859-1 or UTF-8 will 
reintroduce bytes/character mixture.
msg106127 - (view) Author: Dan Buch (meatballhat) Date: 2010-05-20 02:35
This appears to still be an issue in py3k.  I've attached the command and output when running ``python3 -m base64`` with various options and inputs.  If there's consensus on a solution, I'd be happy to take a crack at making a patch.
msg106266 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-21 21:30
Attached base64_main.patch fixes errors described in b64-decode-str-bytes-typeerror.txt.
msg106314 - (view) Author: Dan Buch (meatballhat) Date: 2010-05-22 16:31
@haypo - what patch? :)
msg106315 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-22 17:03
This one!
msg106477 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-25 21:33
I commited base64_main.patch (+ tests): 3.2 (r81533) and 3.1 (r81534).
msg106486 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-25 22:08
Accept unicode string is not "pure", but I agree that it's convinient. Here is a patch:
 * base64.b(16|32|64)encode and b64.encodebytes accept unicode string
 * unicode is first encoded to utf-8 to get a byte string
 * Update the docstrings and the documentation
 * Fix tests
msg106488 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-05-25 22:19
> I commited base64_main.patch (+ tests): 3.2 (r81533) and 3.1 (r81534).

Hum, the test fails on Windows: fixed by r81535 (3.2) and r81536 (3.1).
msg106832 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-06-01 12:18
The patch appears to be fixing the wrong functions.  It is decode that needs to accept unicode.  Encode should still be restricted to bytes.
msg115690 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-09-06 03:40
After thinking about it, I'm inclined to reject this and say that quopri should be fixed to reject string input to decode.  On python-dev Guido opined that a kind of polymorphism in the stdlib was good (bytes in --> bytes out, string in --> string out).  string in --> bytes out and bytes in --> string out was considered bad, to my understanding (except for unicode encode/decode, of course).

As you say, all one has to do is encode the string as ascii to get the bytes to pass in.  It is better, I think, to maintain the clear distinction between bytes and strings in the programmers mind.  That's what Python3 is all about, really.

As for "the whole point of base64 is to safely encode binary data into text characters", that is not true.  The point is to encode binary data into a subset of *ascii*, which is *not* text, it is bytes.  The fact that this is also useful for transferring binary data through unicode is pretty much an unintended consequence of the way unicode is designed.
History
Date User Action Args
2010-10-27 12:16:16eric.araujosetstatus: pending -> closed
2010-10-17 11:37:23georg.brandlunlinkissue9730 dependencies
2010-09-06 03:40:40r.david.murraysetstatus: open -> pending
type: behavior -> enhancement
messages: + msg115690

resolution: rejected
stage: resolved
2010-09-02 20:20:50r.david.murraylinkissue9730 dependencies
2010-06-14 22:40:10vstinnersetversions: + Python 3.2, - Python 3.0
2010-06-07 17:15:27eric.araujosetnosy: + eric.araujo
2010-06-06 18:49:48vstinnersetfiles: - base64_main.patch
2010-06-01 12:18:42r.david.murraysetnosy: + r.david.murray
messages: + msg106832
2010-05-25 22:19:54vstinnersetmessages: + msg106488
2010-05-25 22:08:55vstinnersetfiles: + base64_str.patch

messages: + msg106486
2010-05-25 21:33:19vstinnersetmessages: + msg106477
2010-05-22 17:03:04vstinnersetfiles: + base64_main.patch
keywords: + patch
messages: + msg106315
2010-05-22 16:31:03meatballhatsetmessages: + msg106314
2010-05-21 21:30:52vstinnersetmessages: + msg106266
2010-05-20 02:35:11meatballhatsetfiles: + b64-decode-str-bytes-typeerror.txt
nosy: + meatballhat
messages: + msg106127

2009-01-19 08:00:38kawaisetnosy: + kawai
2009-01-02 02:10:05vstinnersetmessages: + msg78747
2009-01-02 01:31:57vstinnersetnosy: + vstinner
messages: + msg78746
2009-01-01 08:32:14brotchiesetnosy: + brotchie
2008-12-30 18:30:32beazleysetmessages: + msg78554
2008-12-29 17:39:55beazleysetmessages: + msg78468
2008-12-29 17:35:52beazleycreate