classification
Title: Delayed exception using non-text encodings with TextIOWrapper
Type: behavior Stage: resolved
Components: IO Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ncoghlan Nosy List: georg.brandl, haypo, larry, ncoghlan, pitrou, python-dev, serhiy.storchaka
Priority: release blocker Keywords: patch

Created on 2014-01-27 05:01 by ncoghlan, last changed 2014-03-02 08:22 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
issue20404_check_valid_textio_codec.diff ncoghlan, 2014-01-27 12:29 Check codec validity in TextIOWrapper.__init__ review
issue20404_check_valid_textio_codec_v2.diff ncoghlan, 2014-02-02 05:08 Rationalise codec lookups during TextIOWrapper creation review
issue20404_check_valid_textio_codec_v3.diff ncoghlan, 2014-02-04 10:53 Updated for Serhiy's comments review
issue20404_check_valid_textio_codec-3.3.patch serhiy.storchaka, 2014-02-25 18:38 review
Messages (11)
msg209396 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-27 05:01
TextIOWrapper doesn't check the codec type early the way the convenience methods now do, so the open() method currently still suffers from the problems Victor described in issue 19619 for str.encode() and bytes.decode():

>>> with open("hex.txt", 'w') as f:
...     f.write("aabbccddeeff")
... 
12
>>> print(open('hex.txt').read())
aabbccddeeff
>>> print(open('hex.txt', encoding='hex').read())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoder should return a string result, not 'bytes'

These codecs are never going to work correctly with TextIOWrapper, so we should add the appropriate compatibility check in the constructor.
msg209399 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-27 05:25
I also created issue 20405 as an RFE for 3.5, since it seems there is a possible gap in capability relative to Python 2 here.
msg209438 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-01-27 12:29
Attached patch checks the requested encoding is a valid text encoding in TextIOWrapper.__init__.

Two additional test changes were needed:

- the one that checks handling of non-text-encodings at runtime now tweaks quopri to lie about being a text encoding when creating the stream

- the one that checks the interpreter doesn't crash at shutdown needed to be adjusted to handle the fact that _pyio now also fails in that situation, but with a different error (it can't find the ascii codec because the codec machinery is mostly gone)

Currently, this adds a third lookup of the encoding name to the process of creating a TextIOWrapper instance. This could be reduced to just one by changing the retrieval of the encoder and decoder to look in the retrieved codec info tuple, rather than doing the lookup by name again.
msg209948 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-02 05:08
Revised patch that avoids doing multiple lookups of the same codec name while creating the stream.

Absent any comments, I'll commit this version with appropriate NEWS and What's New updates tomorrow.
msg209949 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-02 05:26
Ah, just noticed the test case is still using the overly specific check for the exception wording. I'll fix that, too.
msg209970 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-02 11:17
The _PyCodec_GetIncrementalDecoder name looks too similar to PyCodec_IncrementalDecoder. It would be better to use more different name.

And please note my minor comments on Rietveld.
msg210197 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-04 10:53
v3:

- prefix for internal helper APIs is now _PyCodecInfo_ to better distinguish them from the ones that take an encoding name
- error check is now just for "is not a text encoding"
- tweaked the name and comment in the error test to be clear that it is codecs that are not marked as text encodings that are rejected, not just binary transforms
- fixed indentation of multi-line expression in _pyio.py
msg210198 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-04 11:00
LGTM.
msg210217 - (view) Author: Roundup Robot (python-dev) Date: 2014-02-04 12:14
New changeset f3ec00d2b75e by Nick Coghlan in branch 'default':
Close #20404: blacklist non-text encodings in io.TextIOWrapper
http://hg.python.org/cpython/rev/f3ec00d2b75e
msg212195 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-25 18:38
Here is backported to 3.3 patch.
msg212539 - (view) Author: Roundup Robot (python-dev) Date: 2014-03-02 08:22
New changeset 140a69d950eb by Georg Brandl in branch '3.3':
Issue #20404: reject non-text encodings early in TextIOWrapper.
http://hg.python.org/cpython/rev/140a69d950eb
History
Date User Action Args
2014-03-02 08:22:51python-devsetmessages: + msg212539
2014-02-25 18:38:23serhiy.storchakasetfiles: + issue20404_check_valid_textio_codec-3.3.patch
nosy: + georg.brandl
messages: + msg212195

2014-02-04 12:14:10python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg210217

resolution: fixed
stage: commit review -> resolved
2014-02-04 11:00:20serhiy.storchakasetmessages: + msg210198
stage: test needed -> commit review
2014-02-04 10:53:39ncoghlansetfiles: + issue20404_check_valid_textio_codec_v3.diff

messages: + msg210197
2014-02-02 11:17:18serhiy.storchakasetmessages: + msg209970
2014-02-02 05:26:57ncoghlansetmessages: + msg209949
2014-02-02 05:08:59ncoghlansetfiles: + issue20404_check_valid_textio_codec_v2.diff

messages: + msg209948
2014-01-27 12:29:17ncoghlansetfiles: + issue20404_check_valid_textio_codec.diff

nosy: + pitrou
messages: + msg209438

components: + IO
keywords: + patch
2014-01-27 05:25:43ncoghlansetmessages: + msg209399
2014-01-27 05:01:16ncoghlancreate