Message 289374 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	ezio.melotti, lemburg, serhiy.storchaka, vstinner
Date	2017-03-10.15:09:55
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<69886c48-0d2f-3f91-7457-41043ddcb1ac@egenix.com>
In-reply-to	<1489155449.02.0.609817047168.issue29783@psf.upfronthosting.co.za>

Content
On 10.03.2017 15:17, STINNER Victor wrote: > > The codecs.StreamReaderWriter() class still has old unfixed issues like the issue #12508 (open since 2011). This issue is even seen as a security vulnerability by the owasp-pysec project: > https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated The issue should be fixed. Patches welcome :-) The reason for the problem is the UTF-8 decoder (and other decoders) expecting an extension to the codec decoder API, which are not implemented in its StreamReader class (it simply uses the base class). It's not a problem of the base class, but that of the codec. And no: it doesn't have anything to do with codec.open() or the StreamReaderWriter class. > I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings. -1. People who want to use the io module should use it directly. > With this change, codecs.open() cannot be used with non-text encodings... but I'm not sure that this feature ever worked in Python 3: > > $ ./python -bb > Python 3.7.0a0 >>>> import codecs >>>> f = codecs.open('test', 'w', encoding='rot13') >>>> f.write('hello') > TypeError: a bytes-like object is required, not 'str' >>>> f.write(b'hello') > TypeError: a bytes-like object is required, not 'dict' That's a bug in the rot13 codec, not a feature. codec.open() works just find with 'hex' and 'base64'. > The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-) I'm still -1 on the deprecations in PEP 400. You are essentially suggesting to replace the complete codecs subsystem with the io module, but forgetting that all codecs use StreamWriter and StreamReader as base classes. StreamReaderWriter is just an amalgamation of the two classes StreamReader and StreamWriter, nothing more. It's a completely harmless class in the codecs.py. The codecs sub system has a clean design. If used correctly and maintained with more care, it works really well. Trying to rip things out won't make it better. Fixing implementations, where the appropriate care was not applied, is a much better strategy. I'm tired of having to fight these fights every few years. Can't we just stop having them, please ?

On 10.03.2017 15:17, STINNER Victor wrote:
> 
> The codecs.StreamReaderWriter() class still has old unfixed issues like the issue #12508 (open since 2011). This issue is even seen as a security vulnerability by the owasp-pysec project:
> https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated

The issue should be fixed. Patches welcome :-)

The reason for the problem is the UTF-8 decoder (and other
decoders) expecting an extension to the codec decoder API,
which are not implemented in its StreamReader class (it simply
uses the base class). It's not a problem of the base class, but
that of the codec.

And no: it doesn't have anything to do with codec.open()
or the StreamReaderWriter class.

> I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings.

-1. People who want to use the io module should use it directly.

> With this change, codecs.open() cannot be used with non-text encodings... but I'm not sure that this feature ever worked in Python 3:
> 
> $ ./python -bb
> Python 3.7.0a0
>>>> import codecs
>>>> f = codecs.open('test', 'w', encoding='rot13')
>>>> f.write('hello')
> TypeError: a bytes-like object is required, not 'str'
>>>> f.write(b'hello')
> TypeError: a bytes-like object is required, not 'dict'

That's a bug in the rot13 codec, not a feature. codec.open()
works just find with 'hex' and 'base64'.

> The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-)

I'm still -1 on the deprecations in PEP 400. You are essentially
suggesting to replace the complete codecs subsystem with the
io module, but forgetting that all codecs use StreamWriter and
StreamReader as base classes.

StreamReaderWriter is just an amalgamation of the two
classes StreamReader and StreamWriter, nothing more. It's
a completely harmless class in the codecs.py.

The codecs sub system has a clean design. If used correctly
and maintained with more care, it works really well. Trying
to rip things out won't make it better. Fixing implementations,
where the appropriate care was not applied, is a much better
strategy.

I'm tired of having to fight these fights every few years.
Can't we just stop having them, please ?

History
Date	User	Action	Args
2017-03-10 15:09:55	lemburg	set	recipients: + lemburg, vstinner, ezio.melotti, serhiy.storchaka
2017-03-10 15:09:55	lemburg	link	issue29783 messages
2017-03-10 15:09:55	lemburg	create