classification
Title: binascii should convert unicode strings
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: flox, loewis, pitrou, r.david.murray, wiggin15
Priority: normal Keywords: patch

Created on 2010-09-30 12:03 by wiggin15, last changed 2010-10-01 19:02 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
binascii.diff wiggin15, 2010-09-30 12:03
Messages (4)
msg117727 - (view) Author: Arnon Yaari (wiggin15) * Date: 2010-09-30 12:03
binascii is currently bytes-only, although the "a" in a2b\b2a should refer to unicode strings.
This patch fixes all a2b functions to take str type and all b2a functions to return str types.
This was discussed several times, for example:
http://www.gossamer-threads.com/lists/python/dev/863892
http://bugs.python.org/issue4770#msg78475
As discussed, the required behavior in Python 3k is that a2b should convert str to bytes and vice versa, and this was implemented in this patch.

This patch breaks backward compatibility in several functions... It also required fixing several files under Lib (although most cases were simplified), and also to change the interface of the base64 and quopri modules (this was also requested, e.g. http://bugs.python.org/issue4769).
Note that there already was a small change in this behavior from 3.1 to 3.2 (http://bugs.python.org/issue4770) - b2a_hex and b2a_qp COULD receive strings as input in 3.1 and this was changed in 3.2, so technically, compatibility was already broken...

The patch includes updates to the standard library, the tests and documentation.
rlecode and rledecode were left untouched because they are not a2b\b2a functions per se - they work on the input and output of the hqx functions (maybe it's a good idea to add new a2b\b2a functions for rle_hqx).

Any thoughts are welcome.
msg117742 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-09-30 15:12
Regardless of the various arguments, I think it is too late to break compatibility again, by disallowing bytes input, or changing the output type.
What we could do is allow str arguments to a2b_ functions, with the restriction that the argument must not contain non-ASCII characters.

(Please note that the "a" is for ASCII; otherwise it would be "u" for Unicode :-))
msg117745 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-09-30 15:39
-1 on this patch; I think any change must get consensus on python-dev first, and there is no point in resolving this in the bug tracker. If no agreement can be found (which is actually likely), a PEP needs to be written.

My personal favorite would be to a) leave the binascii module alone, and b) add more functions to the base64 module, taking and returning Unicode strings as encoded objects.

However, I really fail to see what's so bad about

  binascii.b2a_hex(B).decode('ascii')

Under the rule "not every two line function belongs into the standard library", I would then be -0 on adding it to base64.
msg117829 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-01 19:02
Since the issue/patch is about binascii, and I agree with Martin that binascii should continue to do bytes-to-bytes transforms (especially since that is its most common use case, IMO), I'm going to close this issue.  Please open the discussion on python-dev or python-ideas if you wish to proceed with some sort of feature request.
History
Date User Action Args
2010-10-01 19:02:34r.david.murraysetstatus: open -> closed
resolution: rejected
messages: + msg117829
2010-09-30 16:37:09r.david.murraysetnosy: + r.david.murray
2010-09-30 15:39:14loewissetmessages: + msg117745
2010-09-30 15:12:42pitrousetnosy: + pitrou
messages: + msg117742
2010-09-30 15:00:31pitrousetnosy: + loewis, flox
2010-09-30 12:03:19wiggin15create