Message 289362 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, lemburg, serhiy.storchaka, vstinner
Date	2017-03-10.14:17:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1489155449.02.0.609817047168.issue29783@psf.upfronthosting.co.za>
In-reply-to

Content
The codecs.StreamReaderWriter() class still has old unfixed issues like the issue #12508 (open since 2011). This issue is even seen as a security vulnerability by the owasp-pysec project: https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings. With this change, codecs.open() cannot be used with non-text encodings... but I'm not sure that this feature ever worked in Python 3: $ ./python -bb Python 3.7.0a0 >>> import codecs >>> f = codecs.open('test', 'w', encoding='rot13') >>> f.write('hello') TypeError: a bytes-like object is required, not 'str' >>> f.write(b'hello') TypeError: a bytes-like object is required, not 'dict' The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-) Attached PR: * Modify codecs.open() to use io.open() * Remove "; use codecs.open() to handle arbitrary codecs" from io.open() and _pyio.open() error messages * Replace codecs.open() with open() at various places

The codecs.StreamReaderWriter() class still has old unfixed issues like the issue #12508 (open since 2011). This issue is even seen as a security vulnerability by the owasp-pysec project:
https://github.com/ebranca/owasp-pysec/wiki/Unicode-string-silently-truncated

I propose to modify codecs.open() to reuse the io module: call io.open() with newline=''. The io module is now battle-tested and handles well many corner cases of incremental codecs with multibyte encodings.

With this change, codecs.open() cannot be used with non-text encodings... but I'm not sure that this feature ever worked in Python 3:

$ ./python -bb
Python 3.7.0a0
>>> import codecs
>>> f = codecs.open('test', 'w', encoding='rot13')
>>> f.write('hello')
TypeError: a bytes-like object is required, not 'str'
>>> f.write(b'hello')
TypeError: a bytes-like object is required, not 'dict'

The next step would be to deprecate the codecs.StreamReaderWriter class and the codecs.open(). But my latest attempt to deprecate them was the PEP 400 and it wasn't a full success, so I now prefer to move step by step :-)

Attached PR:

* Modify codecs.open() to use io.open()
* Remove "; use codecs.open() to handle arbitrary codecs" from io.open() and _pyio.open() error messages
* Replace codecs.open() with open() at various places

History
Date	User	Action	Args
2017-03-10 14:17:29	vstinner	set	recipients: + vstinner, lemburg, ezio.melotti, serhiy.storchaka
2017-03-10 14:17:29	vstinner	set	messageid: <1489155449.02.0.609817047168.issue29783@psf.upfronthosting.co.za>
2017-03-10 14:17:28	vstinner	link	issue29783 messages
2017-03-10 14:17:28	vstinner	create