Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

str.isidentifier() does not work with non-BMP non-canonicalized strings on Windows #84776

Closed
serhiy-storchaka opened this issue May 11, 2020 · 6 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 40596
Nosy @vstinner, @serhiy-storchaka
PRs
  • bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing non-BMP characters on Windows. #20035
  • bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing non-BMP characters on Windows. #20053
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-05-12.13:19:12.227>
    created_at = <Date 2020-05-11.17:50:16.125>
    labels = ['interpreter-core', 'type-bug', '3.9']
    title = 'str.isidentifier() does not work with non-BMP non-canonicalized strings on Windows'
    updated_at = <Date 2020-05-12.17:27:57.214>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2020-05-12.17:27:57.214>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-05-12.13:19:12.227>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2020-05-11.17:50:16.125>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40596
    keywords = ['patch']
    message_count = 6.0
    messages = ['368637', '368651', '368652', '368705', '368729', '368739']
    nosy_count = 2.0
    nosy_names = ['vstinner', 'serhiy.storchaka']
    pr_nums = ['20035', '20053']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40596'
    versions = ['Python 3.9']

    @serhiy-storchaka
    Copy link
    Member Author

    >>> import _testcapi
    >>> u = '\U0001d580\U0001d593\U0001d58e\U0001d588\U0001d594\U0001d589\U0001d58a'
    >>> u.isidentifier()
    True
    >>> _testcapi.unicode_legacy_string(u).isidentifier()
    False

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels May 11, 2020
    @vstinner
    Copy link
    Member

    It's maybe time to speed up the deprecation of the legacy C API using Py_UNICODE...

    @vstinner
    Copy link
    Member

    My previous change on this function:

    commit f3e7ea5
    Author: Victor Stinner <vstinner@python.org>
    Date: Tue Feb 11 14:29:33 2020 +0100

    bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
    
    PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the
    string is not ready.
    

    @serhiy-storchaka
    Copy link
    Member Author

    I am not sure that changes in bpo-39500 was correct. It is easier to catch a bug if crash consistently when you pass a non-canonicalized strings then if silently return a wrong result for specific input on particular platform.

    Alternatively, you could reimplement correct handling of surrogate pairs in PyUnicode_IsIdentifier().

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 5650e76 by Serhiy Storchaka in branch 'master':
    bpo-40596: Fix str.isidentifier() for non-canonicalized strings containing non-BMP characters on Windows. (GH-20053)
    5650e76

    @vstinner
    Copy link
    Member

    Thanks for the fix Serhiy!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants