Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report surrogate characters range in utf8_encoder #72747

Closed
zhangyangyu opened this issue Oct 30, 2016 · 3 comments
Closed

Report surrogate characters range in utf8_encoder #72747

zhangyangyu opened this issue Oct 30, 2016 · 3 comments
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@zhangyangyu
Copy link
Member

BPO 28561
Nosy @vstinner, @serhiy-storchaka, @zhangyangyu
Files
  • utf8_encoder.patch
  • utf8_encoder_v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-10-30.16:28:28.674>
    created_at = <Date 2016-10-30.07:45:51.280>
    labels = ['interpreter-core', 'type-bug', '3.7']
    title = 'Report surrogate characters range in utf8_encoder'
    updated_at = <Date 2016-10-30.16:28:28.673>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2016-10-30.16:28:28.673>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-10-30.16:28:28.674>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-10-30.07:45:51.280>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['45271', '45273']
    hgrepos = []
    issue_num = 28561
    keywords = ['patch']
    message_count = 3.0
    messages = ['279712', '279728', '279729']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28561'
    versions = ['Python 3.6', 'Python 3.7']

    @zhangyangyu
    Copy link
    Member Author

    In utf8_encoder, when a codecs returns a string with non-ascii characters, it raises encodeerror but the start and end position are not perfect. This seems like an oversight during evolution. Before, utf8_encoder only recognize one surrogate character a time. After 2b5357b38366, it tries to recognize as much as possible a time. Patch also includes some cleanup.

    @zhangyangyu zhangyangyu added the type-bug An unexpected behavior, bug, or error label Oct 30, 2016
    @serhiy-storchaka serhiy-storchaka added interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.7 (EOL) end of life labels Oct 30, 2016
    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 30, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 30, 2016

    New changeset 542065b03c10 by Serhiy Storchaka in branch '3.6':
    Issue bpo-28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.
    https://hg.python.org/cpython/rev/542065b03c10

    New changeset ee3670d9bda6 by Serhiy Storchaka in branch 'default':
    Issue bpo-28561: Clean up UTF-8 encoder: remove dead code, update comments, etc.
    https://hg.python.org/cpython/rev/ee3670d9bda6

    @serhiy-storchaka
    Copy link
    Member

    Thanks Xiang. Yes, this all is follow up bpo-25267.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants