Delayed exception using non-text encodings with TextIOWrapper #64603

ncoghlan · 2014-01-27T05:01:17Z

BPO	20404
Nosy	@birkenfeld, @ncoghlan, @pitrou, @vstinner, @larryhastings, @serhiy-storchaka
Files	issue20404_check_valid_textio_codec.diff: Check codec validity in TextIOWrapper.init issue20404_check_valid_textio_codec_v2.diff: Rationalise codec lookups during TextIOWrapper creation issue20404_check_valid_textio_codec_v3.diff: Updated for Serhiy's comments issue20404_check_valid_textio_codec-3.3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/ncoghlan'
closed_at = <Date 2014-02-04.12:14:10.985>
created_at = <Date 2014-01-27.05:01:16.793>
labels = ['type-bug', 'expert-IO', 'release-blocker']
title = 'Delayed exception using non-text encodings with TextIOWrapper'
updated_at = <Date 2015-05-31.17:21:42.361>
user = 'https://github.com/ncoghlan'

bugs.python.org fields:

activity = <Date 2015-05-31.17:21:42.361>
actor = 'python-dev'
assignee = 'ncoghlan'
closed = True
closed_date = <Date 2014-02-04.12:14:10.985>
closer = 'python-dev'
components = ['IO']
creation = <Date 2014-01-27.05:01:16.793>
creator = 'ncoghlan'
dependencies = []
files = ['33747', '33856', '33899', '34221']
hgrepos = []
issue_num = 20404
keywords = ['patch']
message_count = 12.0
messages = ['209396', '209399', '209438', '209948', '209949', '209970', '210197', '210198', '210217', '212195', '212539', '244549']
nosy_count = 7.0
nosy_names = ['georg.brandl', 'ncoghlan', 'pitrou', 'vstinner', 'larry', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue20404'
versions = ['Python 3.4']

ncoghlan · 2014-01-27T05:01:16Z

TextIOWrapper doesn't check the codec type early the way the convenience methods now do, so the open() method currently still suffers from the problems Victor described in bpo-19619 for str.encode() and bytes.decode():

>>> with open("hex.txt", 'w') as f:
...     f.write("aabbccddeeff")
... 
12
>>> print(open('hex.txt').read())
aabbccddeeff
>>> print(open('hex.txt', encoding='hex').read())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoder should return a string result, not 'bytes'

These codecs are never going to work correctly with TextIOWrapper, so we should add the appropriate compatibility check in the constructor.

ncoghlan · 2014-01-27T05:25:44Z

I also created bpo-20405 as an RFE for 3.5, since it seems there is a possible gap in capability relative to Python 2 here.

ncoghlan · 2014-01-27T12:29:17Z

Attached patch checks the requested encoding is a valid text encoding in TextIOWrapper.__init__.

Two additional test changes were needed:

the one that checks handling of non-text-encodings at runtime now tweaks quopri to lie about being a text encoding when creating the stream
the one that checks the interpreter doesn't crash at shutdown needed to be adjusted to handle the fact that _pyio now also fails in that situation, but with a different error (it can't find the ascii codec because the codec machinery is mostly gone)

Currently, this adds a third lookup of the encoding name to the process of creating a TextIOWrapper instance. This could be reduced to just one by changing the retrieval of the encoder and decoder to look in the retrieved codec info tuple, rather than doing the lookup by name again.

ncoghlan · 2014-02-02T05:08:58Z

Revised patch that avoids doing multiple lookups of the same codec name while creating the stream.

Absent any comments, I'll commit this version with appropriate NEWS and What's New updates tomorrow.

ncoghlan · 2014-02-02T05:26:57Z

Ah, just noticed the test case is still using the overly specific check for the exception wording. I'll fix that, too.

serhiy-storchaka · 2014-02-02T11:17:19Z

The _PyCodec_GetIncrementalDecoder name looks too similar to PyCodec_IncrementalDecoder. It would be better to use more different name.

And please note my minor comments on Rietveld.

ncoghlan · 2014-02-04T10:53:38Z

v3:

prefix for internal helper APIs is now _PyCodecInfo_ to better distinguish them from the ones that take an encoding name
error check is now just for "is not a text encoding"
tweaked the name and comment in the error test to be clear that it is codecs that are not marked as text encodings that are rejected, not just binary transforms
fixed indentation of multi-line expression in _pyio.py

serhiy-storchaka · 2014-02-04T11:00:21Z

LGTM.

python-dev · 2014-02-04T12:14:11Z

New changeset f3ec00d2b75e by Nick Coghlan in branch 'default':
Close bpo-20404: blacklist non-text encodings in io.TextIOWrapper
http://hg.python.org/cpython/rev/f3ec00d2b75e

serhiy-storchaka · 2014-02-25T18:38:20Z

Here is backported to 3.3 patch.

python-dev · 2014-03-02T08:22:52Z

New changeset 140a69d950eb by Georg Brandl in branch '3.3':
Issue bpo-20404: reject non-text encodings early in TextIOWrapper.
http://hg.python.org/cpython/rev/140a69d950eb

python-dev · 2015-05-31T17:21:42Z

New changeset cf6e782a7f94 by Serhiy Storchaka in branch '2.7':
Issue bpo-19543: Emit deprecation warning for known non-text encodings.
https://hg.python.org/cpython/rev/cf6e782a7f94

ncoghlan added the release-blocker label Jan 27, 2014

ncoghlan self-assigned this Jan 27, 2014

ncoghlan added the type-bug An unexpected behavior, bug, or error label Jan 27, 2014

ncoghlan added the topic-IO label Jan 27, 2014

python-dev mannequin closed this as completed Feb 4, 2014

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delayed exception using non-text encodings with TextIOWrapper #64603

Delayed exception using non-text encodings with TextIOWrapper #64603

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Feb 2, 2014

ncoghlan commented Feb 2, 2014

serhiy-storchaka commented Feb 2, 2014

ncoghlan commented Feb 4, 2014

serhiy-storchaka commented Feb 4, 2014

python-dev mannequin commented Feb 4, 2014

serhiy-storchaka commented Feb 25, 2014

python-dev mannequin commented Mar 2, 2014

python-dev mannequin commented May 31, 2015

Delayed exception using non-text encodings with TextIOWrapper #64603

Delayed exception using non-text encodings with TextIOWrapper #64603

Comments

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Jan 27, 2014

ncoghlan commented Feb 2, 2014

ncoghlan commented Feb 2, 2014

serhiy-storchaka commented Feb 2, 2014

ncoghlan commented Feb 4, 2014

serhiy-storchaka commented Feb 4, 2014

python-dev mannequin commented Feb 4, 2014

serhiy-storchaka commented Feb 25, 2014

python-dev mannequin commented Mar 2, 2014

python-dev mannequin commented May 31, 2015