New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
b64decode should accept strings or bytes #49019
Comments
The whole point of base64 encoding is to safely encode binary data into >>> x = 'SGVsbG8='
>>> base64.b64decode(x)
b'Hello'
>>> In Python 3, you get this exception however: >>> base64.b64decode(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/lib/python3.0/base64.py", line 80, in b64decode
raise TypeError("expected bytes, not %s" % s.__class__.__name__)
TypeError: expected bytes, not str
>>> I realize that there are encoding issues with Unicode strings, but I can't think of any real negative impact to making this change as long See bpo-4768. |
Note: This problem applies to all of the other decoders/encoders in the |
One more followup. The quopri module (which is highly related to >>> quopri.decodestring('Hello World')
b'Hello World'
>>> quopri.decodestring(b'Hello World')
b'Hello World'
>>> However, the quopri module, like base64, uses byte strings almost >>> quopri.encodestring(b'Hello World')
b'Hello World'
>>> |
About quoted printable, there are two implementations:
But quopri.decodestring() reuses binascii.a2b_qp() if the binascii
binascii.a2b_qp() encodes unicode string to UTF-8 bytes string. |
On unicode encode error, should we raise an UnicodeEncodeError or a And there is also the problem of base64.b64decode() For the example, the result depends on the choosen charset:
The only valid choice is ASCII because ISO-8859-1 or UTF-8 will |
This appears to still be an issue in py3k. I've attached the command and output when running |
Attached base64_main.patch fixes errors described in b64-decode-str-bytes-typeerror.txt. |
@Haypo - what patch? :) |
This one! |
I commited base64_main.patch (+ tests): 3.2 (r81533) and 3.1 (r81534). |
Accept unicode string is not "pure", but I agree that it's convinient. Here is a patch:
|
Hum, the test fails on Windows: fixed by r81535 (3.2) and r81536 (3.1). |
The patch appears to be fixing the wrong functions. It is decode that needs to accept unicode. Encode should still be restricted to bytes. |
After thinking about it, I'm inclined to reject this and say that quopri should be fixed to reject string input to decode. On python-dev Guido opined that a kind of polymorphism in the stdlib was good (bytes in --> bytes out, string in --> string out). string in --> bytes out and bytes in --> string out was considered bad, to my understanding (except for unicode encode/decode, of course). As you say, all one has to do is encode the string as ascii to get the bytes to pass in. It is better, I think, to maintain the clear distinction between bytes and strings in the programmers mind. That's what Python3 is all about, really. As for "the whole point of base64 is to safely encode binary data into text characters", that is not true. The point is to encode binary data into a subset of *ascii*, which is *not* text, it is bytes. The fact that this is also useful for transferring binary data through unicode is pretty much an unintended consequence of the way unicode is designed. |
python -m base64
with various options and inputsNote: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: