Author vstinner
Recipients ezio.melotti, lemburg, serhiy.storchaka, vstinner
Date 2017-03-10.15:41:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwasZKg=C1uL-uha+v0RKHfdatF8foXWDdK=51cHh0VBFw@mail.gmail.com>
In-reply-to <69886c48-0d2f-3f91-7457-41043ddcb1ac@egenix.com>
Content
> The reason for the problem is the UTF-8 decoder (and other
> decoders) expecting an extension to the codec decoder API,
> which are not implemented in its StreamReader class (it simply
> uses the base class). It's not a problem of the base class, but
> that of the codec.
>
> And no: it doesn't have anything to do with codec.open()
> or the StreamReaderWriter class.

open("document.txt", encoding="utf-8") uses IncrementalDecoder of
encodings.utf_8. This object doesn't seem to have the discussed issue.

IMHO the issue is that StreamReader doesn't use an incremental
decoder. I don't see how it could support multibyte encodings and
error handlers without an incremental decoder.

I like TextIOWrapper design between it only handles codecs and text
buffering. Bytes buffering is done at lower-level in a different
object.

I'm not confortable to modify StreamReader because it combines
TextIOWrapper with BufferedReader and so is more complex.

>> I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings.
>
> -1. People who want to use the io module should use it directly.

When porting code to Python 3, many people chose to use codecs.open()
to get text files using a single code base for Python 2 and Python 3.
Once the code is ported, I don't expect that anyone will replace
codecs.open() with io.open(). You know, nobody cares of the technical
debt...

>> The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-)
>
> I'm still -1 on the deprecations in PEP 400. You are essentially
> suggesting to replace the complete codecs subsystem with the
> io module, but forgetting that all codecs use StreamWriter and
> StreamReader as base classes.

You can elaborate on "all codecs use StreamWriter and StreamReader as
base classes". Only codecs.open() uses StreamReader and StreamWriter,
no?

All codecs implement a StreamReader and StreamWriter class, but my
question is how use these classes?

> The codecs sub system has a clean design. If used correctly
> and maintained with more care, it works really well.

It seems like we lack such maintainer, since I wrote the PEP, many
issues are still open:

http://bugs.python.org/issue7262
http://bugs.python.org/issue8630
http://bugs.python.org/issue10344
http://bugs.python.org/issue12508
http://bugs.python.org/issue12512

See also issue #5445 (wontfix, whereas TextIOWrapper.writeslines()
uses "for line in lines") and issue #12513 (this one is not fair, io
also has the same bug: issue #12215 :-)).

> I'm tired of having to fight these fights every few years.
> Can't we just stop having them, please ?

The status quo is to do nothing, but as a consequence, bugs are still
not fixed yet, and users are still affected by these bugs :-( I'm
trying to find a solution.
History
Date User Action Args
2017-03-10 15:41:50vstinnersetrecipients: + vstinner, lemburg, ezio.melotti, serhiy.storchaka
2017-03-10 15:41:50vstinnerlinkissue29783 messages
2017-03-10 15:41:50vstinnercreate