Author vstinner
Recipients ezio.melotti, lemburg, serhiy.storchaka, vstinner
Date 2017-03-10.14:17:28
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1489155449.02.0.609817047168.issue29783@psf.upfronthosting.co.za>
In-reply-to
Content
The codecs.StreamReaderWriter() class still has old unfixed issues like the issue #12508 (open since 2011). This issue is even seen as a security vulnerability by the owasp-pysec project:
https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated

I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings.

With this change, codecs.open() cannot be used with non-text encodings... but I'm not sure that this feature ever worked in Python 3:

$ ./python -bb
Python 3.7.0a0
>>> import codecs
>>> f = codecs.open('test', 'w', encoding='rot13')
>>> f.write('hello')
TypeError: a bytes-like object is required, not 'str'
>>> f.write(b'hello')
TypeError: a bytes-like object is required, not 'dict'

The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-)

Attached PR:

* Modify codecs.open() to use io.open()
* Remove "; use codecs.open() to handle arbitrary codecs" from io.open() and _pyio.open() error messages
* Replace codecs.open() with open() at various places
History
Date User Action Args
2017-03-10 14:17:29vstinnersetrecipients: + vstinner, lemburg, ezio.melotti, serhiy.storchaka
2017-03-10 14:17:29vstinnersetmessageid: <1489155449.02.0.609817047168.issue29783@psf.upfronthosting.co.za>
2017-03-10 14:17:28vstinnerlinkissue29783 messages
2017-03-10 14:17:28vstinnercreate