This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: binascii b2a functions accept strings (unicode) as data
Type: Stage:
Components: Extension Modules Versions: Python 3.0
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: barry, georg.brandl, loewis, pitrou, terry.reedy
Priority: release blocker Keywords: needs review, patch

Created on 2008-11-22 00:41 by terry.reedy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
reqbytes.diff loewis, 2008-11-30 13:48
Messages (7)
msg76226 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-11-22 00:41
Binascii b2a_xxx functions accept 'binary data' and return ascii-encoded
bytes.  The corresponding a2b_xxx functions turn the ascii-encoded bytes
back to 'binary data' (bytes).  If the binary data is bytes, these
should be inverses of each other.

Somewhat surprisingly to me (because the corresponding base64 module
functions raise "TypeError: expected bytes, not str") 3.0 strings
(unicode) are accepted as 'binary data', though they will not 'round-trip'.

Ascii chars almost do
>>> a='aaaa'
>>> c=b.b2a_base64(a)
>>> c
b'YWFhYQ==\n'
>>> d=b.a2b_base64(c)
>>> d
b'aaaa'

But general unicode chars generate nonsense.
>>> a='\u1000'
>>> c=b.b2a_base64(a)
>>> c
b'4YCA\n'
>>> d=b.a2b_base64(c)
>>> d
b'\xe1\x80\x80'

I also tried b2a_uu.

Is this a bug?
msg76233 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-11-22 08:16
I vote yes.
msg76628 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-11-29 22:48
It's not /exactly/ nonsense, it seems to assume an utf8 encoding pass is
necessary:

>>> b'\xe1\x80\x80'.decode('utf8') == '\u1000'
True

IMO, while accepting unicode strings instead of bytes for the a2b_xx
functions is understandable (because in practice only ASCII characters
are allowed), it is not acceptable for b2a_xx functions to accept
unicode strings instead of bytes.

In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64('YWFh\n')` (that is, b'aaa'), but
`binascii.b2a_base64('aaa')` should raise a TypeError rather than
applying an utf8 encoding pass before doing the actual b2a encoding.

I think this must be fixed before 3.0 final, and is therefore a release
blocker.
msg76629 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-11-29 22:49
Hmm, I obviously meant:

[...] In other words, it might/should be ok for
`binascii.a2b_base64('YWFh\n')` to return the same as
`binascii.a2b_base64(b'YWFh\n')` (that is, b'aaa') [...]
msg76639 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-11-30 13:48
Here is a patches that fixes the problem.

Notice that this affects the email API; base64mime.body_encode now also
requires bytes (whereas quoprimime remains unchanged).

There are probably more functions that still incorrectly accept strings,
e.g. zlib.crc32.
msg76662 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-11-30 20:14
Martin, the patch looks okay to me.  I vote for applying it.
msg76724 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-12-02 06:00
Committed as r67472.
History
Date User Action Args
2022-04-11 14:56:41adminsetgithub: 48637
2008-12-02 06:00:34loewissetstatus: open -> closed
messages: + msg76724
2008-11-30 20:14:38barrysetnosy: + barry
resolution: accepted
messages: + msg76662
2008-11-30 13:58:56loewissetkeywords: + needs review
2008-11-30 13:48:30loewissetfiles: + reqbytes.diff
nosy: + loewis
messages: + msg76639
keywords: + patch
2008-11-29 22:49:42pitrousetmessages: + msg76629
2008-11-29 22:48:03pitrousetpriority: release blocker
nosy: + pitrou
messages: + msg76628
2008-11-22 08:16:40georg.brandlsetnosy: + georg.brandl
messages: + msg76233
2008-11-22 00:41:18terry.reedycreate