Message 209398 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	benjamin.peterson, ezio.melotti, hynek, lemburg, ncoghlan, pitrou, serhiy.storchaka, stutzbach, vstinner
Date	2014-01-27.05:24:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1390800279.91.0.834285021962.issue20405@psf.upfronthosting.co.za>
In-reply-to

Content
Issue 20404 points out that io.TextIOWrapper can't be used with binary transform codecs like bz2 because the types are wrong. By contrast, codecs.open() still defaults to working in binary mode, and just switches to returning a different type based on the specified encoding (exactly the kind of value-driven output type changes we're trying to eliminate from the core text model): >>> import codecs >>> print(codecs.open('hex.txt').read()) b'aabbccddeeff' >>> print(codecs.open('hex.txt', encoding='hex').read()) b'\xaa\xbb\xcc\xdd\xee\xff' >>> print(codecs.open('hex.txt', encoding='utf-8').read()) aabbccddeeff While for 3.4, I plan to just extend the issue 19619 blacklist to also cover TextIOWrapper (and hence open()), it seems to me that there is a valid use case for bytes-to-bytes transform support directly in the IO stack. A PEP for 3.5 could propose: - providing a public API that allows codecs to be classified into at least the following groups ("binary" = memorview compatible data exporters, including both bytes and bytearray): - text encodings (decodes binary to str, encodes str to bytes) - binary transforms (decodes and encodes binary to bytes) - text transforms (decodes and encodes str to str) - hybrid transforms (acts as both a binary transform and as a text transform) - hybrid encodings (decodes binary and potentially str to str, encodes binary and str to bytes) - arbitrary encodings (decodes and encodes object to object, without fitting any of the above categories) - adding io.BinaryTransformWrapper that applies binary transforms when reading and writing data (similar to the way TextIOWrapper applies text encodings) - adding a "transform" parameter to open that inserts BinaryTransformWrapper into the stack at the appropriate place (the PEP process would need to decide between supporting just a single transform per stream or multiple). In text mode, TextIOWrapper would be added to the stack after any binary transforms. Optionally, the idea could also be extended to adding io.TextTransformWrapper and a "text_transform" parameter, but those seem somewhat less useful.

Issue 20404 points out that io.TextIOWrapper can't be used with binary transform codecs like bz2 because the types are wrong.

By contrast, codecs.open() still defaults to working in binary mode, and just switches to returning a different type based on the specified encoding (exactly the kind of value-driven output type changes we're trying to eliminate from the core text model):

>>> import codecs
>>> print(codecs.open('hex.txt').read())
b'aabbccddeeff'
>>> print(codecs.open('hex.txt', encoding='hex').read())
b'\xaa\xbb\xcc\xdd\xee\xff'
>>> print(codecs.open('hex.txt', encoding='utf-8').read())
aabbccddeeff

While for 3.4, I plan to just extend the issue 19619 blacklist to also cover TextIOWrapper (and hence open()), it seems to me that there is a valid use case for bytes-to-bytes transform support directly in the IO stack.

A PEP for 3.5 could propose:

- providing a public API that allows codecs to be classified into at least the following groups ("binary" = memorview compatible data exporters, including both bytes and bytearray):
  - text encodings (decodes binary to str, encodes str to bytes)
  - binary transforms (decodes *and* encodes binary to bytes)
  - text transforms (decodes and encodes str to str)
  - hybrid transforms (acts as both a binary transform *and* as a text transform)
  - hybrid encodings (decodes binary and potentially str to str, encodes binary and str to bytes)
  - arbitrary encodings (decodes and encodes object to object, without fitting any of the above categories)

- adding io.BinaryTransformWrapper that applies binary transforms when reading and writing data (similar to the way TextIOWrapper applies text encodings)

- adding a "transform" parameter to open that inserts BinaryTransformWrapper into the stack at the appropriate place (the PEP process would need to decide between supporting just a single transform per stream or multiple). In text mode, TextIOWrapper would be added to the stack after any binary transforms.

Optionally, the idea could also be extended to adding io.TextTransformWrapper and a "text_transform" parameter, but those seem somewhat less useful.

History
Date	User	Action	Args
2014-01-27 05:24:39	ncoghlan	set	recipients: + ncoghlan, lemburg, pitrou, vstinner, benjamin.peterson, stutzbach, ezio.melotti, hynek, serhiy.storchaka
2014-01-27 05:24:39	ncoghlan	set	messageid: <1390800279.91.0.834285021962.issue20405@psf.upfronthosting.co.za>
2014-01-27 05:24:39	ncoghlan	link	issue20405 messages
2014-01-27 05:24:39	ncoghlan	create