Issue 4387: binascii b2a functions accept strings (unicode) as data

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48637

classification

Title:	binascii b2a functions accept strings (unicode) as data
Type:		Stage:
Components:	Extension Modules	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	barry, georg.brandl, loewis, pitrou, terry.reedy
Priority:	release blocker	Keywords:	needs review, patch

Created on 2008-11-22 00:41 by terry.reedy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
reqbytes.diff	loewis, 2008-11-30 13:48

Messages (7)
msg76226 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-11-22 00:41
Binascii b2a_xxx functions accept 'binary data' and return ascii-encoded bytes. The corresponding a2b_xxx functions turn the ascii-encoded bytes back to 'binary data' (bytes). If the binary data is bytes, these should be inverses of each other. Somewhat surprisingly to me (because the corresponding base64 module functions raise "TypeError: expected bytes, not str") 3.0 strings (unicode) are accepted as 'binary data', though they will not 'round-trip'. Ascii chars almost do >>> a='aaaa' >>> c=b.b2a_base64(a) >>> c b'YWFhYQ==\n' >>> d=b.a2b_base64(c) >>> d b'aaaa' But general unicode chars generate nonsense. >>> a='\u1000' >>> c=b.b2a_base64(a) >>> c b'4YCA\n' >>> d=b.a2b_base64(c) >>> d b'\xe1\x80\x80' I also tried b2a_uu. Is this a bug?
msg76233 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-11-22 08:16
I vote yes.
msg76628 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-11-29 22:48
It's not /exactly/ nonsense, it seems to assume an utf8 encoding pass is necessary: >>> b'\xe1\x80\x80'.decode('utf8') == '\u1000' True IMO, while accepting unicode strings instead of bytes for the a2b_xx functions is understandable (because in practice only ASCII characters are allowed), it is not acceptable for b2a_xx functions to accept unicode strings instead of bytes. In other words, it might/should be ok for `binascii.a2b_base64('YWFh\n')` to return the same as `binascii.a2b_base64('YWFh\n')` (that is, b'aaa'), but `binascii.b2a_base64('aaa')` should raise a TypeError rather than applying an utf8 encoding pass before doing the actual b2a encoding. I think this must be fixed before 3.0 final, and is therefore a release blocker.
msg76629 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-11-29 22:49
Hmm, I obviously meant: [...] In other words, it might/should be ok for `binascii.a2b_base64('YWFh\n')` to return the same as `binascii.a2b_base64(b'YWFh\n')` (that is, b'aaa') [...]
msg76639 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-11-30 13:48
Here is a patches that fixes the problem. Notice that this affects the email API; base64mime.body_encode now also requires bytes (whereas quoprimime remains unchanged). There are probably more functions that still incorrectly accept strings, e.g. zlib.crc32.
msg76662 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2008-11-30 20:14
Martin, the patch looks okay to me. I vote for applying it.
msg76724 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2008-12-02 06:00
Committed as r67472.

History
Date	User	Action	Args
2022-04-11 14:56:41	admin	set	github: 48637
2008-12-02 06:00:34	loewis	set	status: open -> closed messages: + msg76724
2008-11-30 20:14:38	barry	set	nosy: + barry resolution: accepted messages: + msg76662
2008-11-30 13:58:56	loewis	set	keywords: + needs review
2008-11-30 13:48:30	loewis	set	files: + reqbytes.diff nosy: + loewis messages: + msg76639 keywords: + patch
2008-11-29 22:49:42	pitrou	set	messages: + msg76629
2008-11-29 22:48:03	pitrou	set	priority: release blocker nosy: + pitrou messages: + msg76628
2008-11-22 08:16:40	georg.brandl	set	nosy: + georg.brandl messages: + msg76233
2008-11-22 00:41:18	terry.reedy	create