Message 107363 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	doerwalter, eric.araujo, lemburg, loewis, pitrou, vstinner
Date	2010-06-08.23:13:32
SpamBayes Score	0.00082432927
Marked as misclassified	No
Message-id	<1276038814.42.0.999450131558.issue8838@psf.upfronthosting.co.za>
In-reply-to

Content
r81854 removes codecs.charbuffer_encode() (and t# parsing format) from Python 3.2 (blocked in 3.1: r81855). -- My problem with codecs.readbuffer_encode() is that it does accept byte and character strings. If you want to get a byte string, just use bytes(input). If you want to convert a character string to a byte string, use input.encode("utf-8"). But accepting both types may lead to mojibake as we had in Python2. MAL> That's a common misunderstanding. The codec system does not MAL> mandate a specific type combination. Only the helper methods MAL> .encode() and .decode() on bytes and str objects in Python3 do. This is related to #7475: we have to decide if we drop completly this (currently unused) feature (eg. remove codecs.readbuffer_encode()), or if we "reenable" this feature again (reintroduce hex, bz2, rot13, ... codecs). This discussion should occur on the mailing list.

r81854 removes codecs.charbuffer_encode() (and t# parsing format) from Python 3.2 (blocked in 3.1: r81855).

--

My problem with codecs.readbuffer_encode() is that it does accept byte *and* character strings. If you want to get a byte string, just use bytes(input). If you want to convert a character string to a byte string, use input.encode("utf-8"). But accepting both types may lead to mojibake as we had in Python2.

MAL> That's a common misunderstanding. The codec system does not
MAL> mandate a specific type combination. Only the helper methods
MAL> .encode() and .decode() on bytes and str objects in Python3 do.

This is related to #7475: we have to decide if we drop completly this  (currently unused) feature (eg. remove codecs.readbuffer_encode()), or if we "reenable" this feature again (reintroduce hex, bz2, rot13, ... codecs). This discussion should occur on the mailing list.

History
Date	User	Action	Args
2010-06-08 23:13:34	vstinner	set	recipients: + vstinner, lemburg, loewis, doerwalter, pitrou, eric.araujo
2010-06-08 23:13:34	vstinner	set	messageid: <1276038814.42.0.999450131558.issue8838@psf.upfronthosting.co.za>
2010-06-08 23:13:33	vstinner	link	issue8838 messages
2010-06-08 23:13:32	vstinner	create