Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed exception using non-text encodings with TextIOWrapper #64603

Closed
ncoghlan opened this issue Jan 27, 2014 · 12 comments
Closed

Delayed exception using non-text encodings with TextIOWrapper #64603

ncoghlan opened this issue Jan 27, 2014 · 12 comments
Assignees
Labels
release-blocker topic-IO type-bug An unexpected behavior, bug, or error

Comments

@ncoghlan
Copy link
Contributor

BPO 20404
Nosy @birkenfeld, @ncoghlan, @pitrou, @vstinner, @larryhastings, @serhiy-storchaka
Files
  • issue20404_check_valid_textio_codec.diff: Check codec validity in TextIOWrapper.init
  • issue20404_check_valid_textio_codec_v2.diff: Rationalise codec lookups during TextIOWrapper creation
  • issue20404_check_valid_textio_codec_v3.diff: Updated for Serhiy's comments
  • issue20404_check_valid_textio_codec-3.3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2014-02-04.12:14:10.985>
    created_at = <Date 2014-01-27.05:01:16.793>
    labels = ['type-bug', 'expert-IO', 'release-blocker']
    title = 'Delayed exception using non-text encodings with TextIOWrapper'
    updated_at = <Date 2015-05-31.17:21:42.361>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2015-05-31.17:21:42.361>
    actor = 'python-dev'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2014-02-04.12:14:10.985>
    closer = 'python-dev'
    components = ['IO']
    creation = <Date 2014-01-27.05:01:16.793>
    creator = 'ncoghlan'
    dependencies = []
    files = ['33747', '33856', '33899', '34221']
    hgrepos = []
    issue_num = 20404
    keywords = ['patch']
    message_count = 12.0
    messages = ['209396', '209399', '209438', '209948', '209949', '209970', '210197', '210198', '210217', '212195', '212539', '244549']
    nosy_count = 7.0
    nosy_names = ['georg.brandl', 'ncoghlan', 'pitrou', 'vstinner', 'larry', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue20404'
    versions = ['Python 3.4']

    @ncoghlan
    Copy link
    Contributor Author

    TextIOWrapper doesn't check the codec type early the way the convenience methods now do, so the open() method currently still suffers from the problems Victor described in bpo-19619 for str.encode() and bytes.decode():

    >>> with open("hex.txt", 'w') as f:
    ...     f.write("aabbccddeeff")
    ... 
    12
    >>> print(open('hex.txt').read())
    aabbccddeeff
    >>> print(open('hex.txt', encoding='hex').read())
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: decoder should return a string result, not 'bytes'

    These codecs are never going to work correctly with TextIOWrapper, so we should add the appropriate compatibility check in the constructor.

    @ncoghlan ncoghlan self-assigned this Jan 27, 2014
    @ncoghlan ncoghlan added the type-bug An unexpected behavior, bug, or error label Jan 27, 2014
    @ncoghlan
    Copy link
    Contributor Author

    I also created bpo-20405 as an RFE for 3.5, since it seems there is a possible gap in capability relative to Python 2 here.

    @ncoghlan
    Copy link
    Contributor Author

    Attached patch checks the requested encoding is a valid text encoding in TextIOWrapper.__init__.

    Two additional test changes were needed:

    • the one that checks handling of non-text-encodings at runtime now tweaks quopri to lie about being a text encoding when creating the stream

    • the one that checks the interpreter doesn't crash at shutdown needed to be adjusted to handle the fact that _pyio now also fails in that situation, but with a different error (it can't find the ascii codec because the codec machinery is mostly gone)

    Currently, this adds a third lookup of the encoding name to the process of creating a TextIOWrapper instance. This could be reduced to just one by changing the retrieval of the encoder and decoder to look in the retrieved codec info tuple, rather than doing the lookup by name again.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 2, 2014

    Revised patch that avoids doing multiple lookups of the same codec name while creating the stream.

    Absent any comments, I'll commit this version with appropriate NEWS and What's New updates tomorrow.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 2, 2014

    Ah, just noticed the test case is still using the overly specific check for the exception wording. I'll fix that, too.

    @serhiy-storchaka
    Copy link
    Member

    The _PyCodec_GetIncrementalDecoder name looks too similar to PyCodec_IncrementalDecoder. It would be better to use more different name.

    And please note my minor comments on Rietveld.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 4, 2014

    v3:

    • prefix for internal helper APIs is now _PyCodecInfo_ to better distinguish them from the ones that take an encoding name
    • error check is now just for "is not a text encoding"
    • tweaked the name and comment in the error test to be clear that it is codecs that are not marked as text encodings that are rejected, not just binary transforms
    • fixed indentation of multi-line expression in _pyio.py

    @serhiy-storchaka
    Copy link
    Member

    LGTM.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 4, 2014

    New changeset f3ec00d2b75e by Nick Coghlan in branch 'default':
    Close bpo-20404: blacklist non-text encodings in io.TextIOWrapper
    http://hg.python.org/cpython/rev/f3ec00d2b75e

    @python-dev python-dev mannequin closed this as completed Feb 4, 2014
    @serhiy-storchaka
    Copy link
    Member

    Here is backported to 3.3 patch.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 2, 2014

    New changeset 140a69d950eb by Georg Brandl in branch '3.3':
    Issue bpo-20404: reject non-text encodings early in TextIOWrapper.
    http://hg.python.org/cpython/rev/140a69d950eb

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 31, 2015

    New changeset cf6e782a7f94 by Serhiy Storchaka in branch '2.7':
    Issue bpo-19543: Emit deprecation warning for known non-text encodings.
    https://hg.python.org/cpython/rev/cf6e782a7f94

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    release-blocker topic-IO type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants