Message289378
> The reason for the problem is the UTF-8 decoder (and other
> decoders) expecting an extension to the codec decoder API,
> which are not implemented in its StreamReader class (it simply
> uses the base class). It's not a problem of the base class, but
> that of the codec.
>
> And no: it doesn't have anything to do with codec.open()
> or the StreamReaderWriter class.
open("document.txt", encoding="utf-8") uses IncrementalDecoder of
encodings.utf_8. This object doesn't seem to have the discussed issue.
IMHO the issue is that StreamReader doesn't use an incremental
decoder. I don't see how it could support multibyte encodings and
error handlers without an incremental decoder.
I like TextIOWrapper design between it only handles codecs and text
buffering. Bytes buffering is done at lower-level in a different
object.
I'm not confortable to modify StreamReader because it combines
TextIOWrapper with BufferedReader and so is more complex.
>> I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings.
>
> -1. People who want to use the io module should use it directly.
When porting code to Python 3, many people chose to use codecs.open()
to get text files using a single code base for Python 2 and Python 3.
Once the code is ported, I don't expect that anyone will replace
codecs.open() with io.open(). You know, nobody cares of the technical
debt...
>> The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-)
>
> I'm still -1 on the deprecations in PEP 400. You are essentially
> suggesting to replace the complete codecs subsystem with the
> io module, but forgetting that all codecs use StreamWriter and
> StreamReader as base classes.
You can elaborate on "all codecs use StreamWriter and StreamReader as
base classes". Only codecs.open() uses StreamReader and StreamWriter,
no?
All codecs implement a StreamReader and StreamWriter class, but my
question is how use these classes?
> The codecs sub system has a clean design. If used correctly
> and maintained with more care, it works really well.
It seems like we lack such maintainer, since I wrote the PEP, many
issues are still open:
http://bugs.python.org/issue7262
http://bugs.python.org/issue8630
http://bugs.python.org/issue10344
http://bugs.python.org/issue12508
http://bugs.python.org/issue12512
See also issue #5445 (wontfix, whereas TextIOWrapper.writeslines()
uses "for line in lines") and issue #12513 (this one is not fair, io
also has the same bug: issue #12215 :-)).
> I'm tired of having to fight these fights every few years.
> Can't we just stop having them, please ?
The status quo is to do nothing, but as a consequence, bugs are still
not fixed yet, and users are still affected by these bugs :-( I'm
trying to find a solution. |
|
Date |
User |
Action |
Args |
2017-03-10 15:41:50 | vstinner | set | recipients:
+ vstinner, lemburg, ezio.melotti, serhiy.storchaka |
2017-03-10 15:41:50 | vstinner | link | issue29783 messages |
2017-03-10 15:41:50 | vstinner | create | |
|