Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better start and end position for unicodeerror in unicode_encode_ucs1 #72960

Closed
zhangyangyu opened this issue Nov 22, 2016 · 10 comments
Closed
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@zhangyangyu
Copy link
Member

BPO 28774
Nosy @vstinner, @serhiy-storchaka, @zhangyangyu
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • unicode_encode_ucs1_error_pos.patch
  • unicode_encode_ucs1_error_pos2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-11-23.13:19:32.151>
    created_at = <Date 2016-11-22.14:29:22.789>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = 'Better start and end position for unicodeerror in unicode_encode_ucs1'
    updated_at = <Date 2017-03-31.16:36:29.400>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:29.400>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-11-23.13:19:32.151>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-11-22.14:29:22.789>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['45602', '45603']
    hgrepos = []
    issue_num = 28774
    keywords = ['patch']
    message_count = 10.0
    messages = ['281482', '281489', '281490', '281517', '281519', '281520', '281557', '281558', '281560', '281562']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = ['552']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28774'
    versions = ['Python 3.7']

    @zhangyangyu
    Copy link
    Member Author

    unicode_encode_ucs1 now recognizes as many characters as it can one time instead of one character a time. But the unicodeerror positions still only count 1(the second time). A similar problem reported in bpo-28561.

    @zhangyangyu zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Nov 22, 2016
    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 22, 2016
    @vstinner
    Copy link
    Member

    If I understood correctly, the patch fix the ASCII encoder to handle correctly error handlers which return non-ASCII text replacement strings. Right?

    I am not aware of such error handler, so I guess that it's a more a theorical fix?

    I really hate the code (in each encoder) which handles non-ASCII replacement strings. The code in the charmap encoder is just a mess: it uses a reentrant call to the encoder... I never understood this crazy behaviour. I guess that nobody relies on the behaviour. I hesitate to simply raise an error instead of using different rules depending on the code. Ah yes, by the way, each codec behaves differently on non-ASCII replacement strings...

    @serhiy-storchaka
    Copy link
    Member

    LGTM. But it is too late for beta 4. I'll commit the patch either after releasing 3.6.0 or in the 3.7 branch only.

    And while we are here I noticed that handling non-ASCII replacement string could be simpler.

    @vstinner
    Copy link
    Member

    LGTM. But it is too late for beta 4. I'll commit the patch either after releasing 3.6.0 or in the 3.7 branch only.

    Right now, I suggest to only commit into 3.7. Such minor bug can wait for Python 3.6.1.

    And while we are here I noticed that handling non-ASCII replacement string could be simpler

    I also suggest to first commit unicode_encode_ucs1_error_pos.patch and then commit the other part of unicode_encode_ucs1_error_pos2.patch in a separated commit. I will be easy to backport the fix to the 3.6 branch later.

    Serhiy: Xiang became a core developer, are you ok if he push himself unicode_encode_ucs1_error_pos.patch to default tomorrow, and later you rebase your patch on top of that?

    I'm not super confident because the fix doesn't come with an unit test, but it's ok if Serhiy reviewed it :-)

    @serhiy-storchaka
    Copy link
    Member

    This is not a bug itself. It seems to me that at worst case the current code is less efficient with non-standard error handler than it can be. I would commit the path to the 3.6 branch before beta 4 as it is nice and simple additional to already added optimization. But it is too late now, at last beta.

    Xiang can commit his patch to 3.7. No need to backport it to 3.6 (if I didn't miss something).

    @vstinner
    Copy link
    Member

    No need to backport it to 3.6 (if I didn't miss something).

    Sorry, I misunderstood the issue. It's an enhancement, so for 3.7 only.

    Right, 3.6 is now almost frozen, only major bug fixes blocking the release are accepted now (in short). Regular bugfixes should wait for 3.6.1.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 23, 2016

    New changeset 3d660ed2a60e by Xiang Zhang in branch 'default':
    Issue bpo-28774: Fix start/end pos in unicode_encode_ucs1().
    https://hg.python.org/cpython/rev/3d660ed2a60e

    @zhangyangyu
    Copy link
    Member Author

    Thanks Serhiy and Victor. Finished my first commit. :-)

    Now assign back to Serhiy and pos2 LGTM.

    @serhiy-storchaka
    Copy link
    Member

    Congratulations, Xiang!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 23, 2016

    New changeset 3addf93f4111 by Serhiy Storchaka in branch 'default':
    Issue bpo-28774: Simplified encoding a str result of an error handler in ASCII
    https://hg.python.org/cpython/rev/3addf93f4111

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants