Message 136212 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	brett.cannon, eric.araujo, lemburg, loewis, meatballhat, pitrou, rhettinger, vstinner
Date	2011-05-18.08:25:54
SpamBayes Score	5.87016e-11
Marked as misclassified	No
Message-id	<4DD3828F.2060106@egenix.com>
In-reply-to	<1305676605.78.0.639804793971.issue8796@psf.upfronthosting.co.za>

Content
STINNER Victor wrote: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning. This ticket is about deprecating codecs.open(), not about StreamWriter and StreamReader. The arguments mentioned here against doing that anytime soon still stand. I'm -1 on deprecating StreamWriter and StreamReader as they provide different mechanisms than the io layer which has a specific focus on files and buffers. > -- > > codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method). > > codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862). > > These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly. Those are not bugs of the generic codecs.StreamWriter/StreamReader implementations or their concept. They are bugs in those specific codecs. The codecs StreamWriter and StreamReader concept was explicitly designed to be able to have state. However, the generic implementation does not make use of such state for the purpose of writing special beginning-of-file markers - that's just way to specific for general purpose implementations. They do use state to implement buffered reads. It would certainly be possible to make the implementations of the codecs you mentioned smarter to handle writing BOMs correctly, e.g. by making use of the incremental encoder/decoders, if there's interest.

STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> Python 3.2 has been published. Can we start deprecating StreamWriter and StreamReader in Python 3.3 (to remove them from Python 3.4)? The doc should explain how to convert code using codecs into code using the io module (it should be simple), and using a StreamReader/StreamWriter should emit a warning.

This ticket is about deprecating codecs.open(), not about
StreamWriter and StreamReader.

The arguments mentioned here against doing that anytime soon
still stand.

I'm -1 on deprecating StreamWriter and StreamReader as they provide
different mechanisms than the io layer which has a specific focus
on files and buffers.

> --
> 
> codecs.StreamWriter writes twice the BOM of UTF-8-SIG, UTF-16, UTF-32 encodings if the file is opened in append mode or after a seek(0). Bug fixed in io.TextIOWrapper (issue #5006). io.TextIOWrapper calls also encoder.setstate(0) on a seek different than seek(0), whereas codecs.StreamWriter doesn't (it is not an incremental encoder, it doesn't have the setstate method).
> 
> codecs.StreamReader doesn't ignore the BOM of UTF-8-SIG, UTF-16 or UTF-32 encodings after seek(0). Bug fixed in io.TextIOWrapper (issue #4862).
> 
> These bugs should maybe be mentioned in the codecs doc, with a pointer to the io module saying that the io module handles these encodings correctly.

Those are not bugs of the generic codecs.StreamWriter/StreamReader
implementations or their concept. They are bugs in those specific
codecs.

The codecs StreamWriter and StreamReader concept was explicitly
designed to be able to have state. However, the generic implementation
does not make use of such state for the purpose of writing special
beginning-of-file markers - that's just way to specific for general
purpose implementations. They do use state to implement buffered
reads.

It would certainly be possible to make the implementations of
the codecs you mentioned smarter to handle writing BOMs correctly,
e.g. by making use of the incremental encoder/decoders, if there's
interest.

History
Date	User	Action	Args
2011-05-18 08:25:55	lemburg	set	recipients: + lemburg, loewis, brett.cannon, rhettinger, pitrou, vstinner, eric.araujo, meatballhat
2011-05-18 08:25:55	lemburg	link	issue8796 messages
2011-05-18 08:25:54	lemburg	create