Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document PyUnicode_IsIdentifier() function #83681

Closed
vstinner opened this issue Jan 30, 2020 · 8 comments
Closed

Document PyUnicode_IsIdentifier() function #83681

vstinner opened this issue Jan 30, 2020 · 8 comments
Labels

Comments

@vstinner
Copy link
Member

BPO 39500
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka
PRs
  • bpo-39500: Document PyUnicode_IsIdentifier() function #18280
  • bpo-39500: Document PyUnicode_IsIdentifier() function #18397
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-02-11.13:29:53.653>
    created_at = <Date 2020-01-30.09:22:35.097>
    labels = ['expert-C-API', '3.9', 'expert-unicode']
    title = 'Document PyUnicode_IsIdentifier() function'
    updated_at = <Date 2020-02-17.13:41:22.258>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2020-02-17.13:41:22.258>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-02-11.13:29:53.653>
    closer = 'vstinner'
    components = ['Unicode', 'C API']
    creation = <Date 2020-01-30.09:22:35.097>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 39500
    keywords = ['patch']
    message_count = 8.0
    messages = ['361027', '361031', '361043', '361048', '361049', '361079', '361816', '362141']
    nosy_count = 3.0
    nosy_names = ['vstinner', 'ezio.melotti', 'serhiy.storchaka']
    pr_nums = ['18280', '18397']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue39500'
    versions = ['Python 3.9']

    @vstinner
    Copy link
    Member Author

    The PyUnicode_IsIdentifier() function should be documented.

    Attachd PR documents it.

    @serhiy-storchaka
    Copy link
    Member

    Since the behavior was changed, I think we need a versionchanged directive.

    This function was added in 3.0. Prior to 3.3 it was always successful (if you pass an unicode object, that is required for most of PyUnicode API). Py_FatalError was added in 3.3, because not all unicode object are now valid. But I am not sure that it could be triggered in real code without using some broken extensions.

    It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails. If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value.

    @vstinner
    Copy link
    Member Author

    Since the behavior was changed, I think we need a versionchanged directive.

    Done.

    But I am not sure that it could be triggered in real code without using some broken extensions.

    _PyUnicode_Ready() can fail in two cases: the Py_UNICODE* string contains a character outside [U+0000; U+10ffff] character range, or a memory allocation failed. Both cases look very unlikely, knowning that _PyUnicode_Ready() is only called if PyUnicode_IsIdentifier() is called on a string using the Python legacy C API (Py_UNICODE*).

    It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails.

    Functions which don't use PyObject* as return type have the tradition of returning -1. A few examples:

    • PyUnicode_GetLength()
    • PyUnicode_CopyCharacters()
    • PyUnicode_ReadChar() (return "(Py_UCS4)-1" on error)
    • PyLong_AsUnsignedLong() (return "(unsigned long)-1" on error)
    • etc.

    I don't see why PyUnicode_IsIdentifier() deserves to behave differently.

    If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value.

    I updated my PR to document the behavior change in the "Changes in the C API" section of What's New in Python 3.9 documentation.

    Previously, the code crashed Python (Py_FatalError). So I'm confident that this case never occurred ever in the wild. We got zero bug report about that.

    @serhiy-storchaka
    Copy link
    Member

    Other examples are:

    • PyObject_HasAttr
    • PyObject_HasAttrString
    • PyMapping_HasKey
    • PyMapping_HasKeyString

    They are bad examples, but can't be changed for backward compatibility. I wonder whether PyUnicode_IsIdentifier should also kept unchanged for backward compatibility.

    There is also a number of *_Check functions which always succeeds.

    Other example is _PyUnicode_EqualToASCIIString where the behavior is intentional.

    PyUnicode_IsIdentifier() was not documented before. It makes easier to change its behavior.

    @vstinner
    Copy link
    Member Author

    They are bad examples, but can't be changed for backward compatibility.

    I don't think that we should follow these bad examples :-) IMO ignoring silently bugs is a bad programming practice.

    I don't expect PyUnicode_IsIdentifier() to be used outside Python. If it's used, I don't see why it would be a "non-ready string" in practice. The risk of regression is very close to zero. If it happens, it's no longer our fault, since I documented the behavior change ;-) Again, right now, Python does crash if this corner case occurs...

    @serhiy-storchaka
    Copy link
    Member

    It is not convenient to check the result for error. After we remove support of old PyUnicode API, PyUnicode_IsIdentifier() will be always succeeded.

    Note that PyUnicode_IsIdentifier() still can crash if you pass a non-PyUnicode object or NULL. It is a prerequisite of this function that the argument must be a PyUnicode object. Add just yet one condition: it must be a ready PyUnicode object.

    @vstinner
    Copy link
    Member Author

    New changeset f3e7ea5 by Victor Stinner in branch 'master':
    bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
    f3e7ea5

    @vstinner
    Copy link
    Member Author

    New changeset 3d235f5 by Hai Shi in branch 'master':
    bpo-39500: Fix compile warnings in unicodeobject.c (GH-18519)
    3d235f5

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants