classification
Title: Document PyUnicode_IsIdentifier() function
Type: Stage: resolved
Components: C API, Unicode Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2020-01-30 09:22 by vstinner, last changed 2020-02-17 13:41 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18280 closed vstinner, 2020-01-30 09:26
PR 18397 merged vstinner, 2020-02-07 08:38
Messages (8)
msg361027 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-30 09:22
The PyUnicode_IsIdentifier() function should be documented.

Attachd PR documents it.
msg361031 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-30 10:13
Since the behavior was changed, I think we need a versionchanged directive.

This function was added in 3.0. Prior to 3.3 it was always successful (if you pass an unicode object, that is required for most of PyUnicode API). Py_FatalError was added in 3.3, because not all unicode object are now valid. But I am not sure that it could be triggered in real code without using some broken extensions.

It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails. If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value.
msg361043 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-30 11:51
> Since the behavior was changed, I think we need a versionchanged directive.

Done.


> But I am not sure that it could be triggered in real code without using some broken extensions.

_PyUnicode_Ready() can fail in two cases: the Py_UNICODE* string contains a character outside [U+0000; U+10ffff] character range, or a memory allocation failed. Both cases look very unlikely, knowning that _PyUnicode_Ready() is only called if PyUnicode_IsIdentifier() is called on a string using the Python legacy C API (Py_UNICODE*).


> It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails.

Functions which don't use PyObject* as return type have the tradition of returning -1. A few examples:

* PyUnicode_GetLength()
* PyUnicode_CopyCharacters()
* PyUnicode_ReadChar() (return "(Py_UCS4)-1" on error)
* PyLong_AsUnsignedLong() (return "(unsigned long)-1" on error)
* etc.

I don't see why PyUnicode_IsIdentifier() deserves to behave differently.


> If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value.

I updated my PR to document the behavior change in the "Changes in the C API" section of What's New in Python 3.9 documentation.

Previously, the code crashed Python (Py_FatalError). So I'm confident that this case never occurred ever in the wild. We got zero bug report about that.
msg361048 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-30 12:04
Other examples are:

* PyObject_HasAttr
* PyObject_HasAttrString
* PyMapping_HasKey
* PyMapping_HasKeyString

They are bad examples, but can't be changed for backward compatibility. I wonder whether PyUnicode_IsIdentifier should also kept unchanged for backward compatibility.

There is also a number of *_Check functions which always succeeds.

Other example is _PyUnicode_EqualToASCIIString where the behavior is intentional.

PyUnicode_IsIdentifier() was not documented before. It makes easier to change its behavior.
msg361049 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-30 12:07
> They are bad examples, but can't be changed for backward compatibility.

I don't think that we should follow these bad examples :-) IMO ignoring silently bugs is a bad programming practice.

I don't expect PyUnicode_IsIdentifier() to be used outside Python. If it's used, I don't see why it would be a "non-ready string" in practice. The risk of regression is very close to zero. If it happens, it's no longer our fault, since I documented the behavior change ;-) Again, right now, Python does crash if this corner case occurs...
msg361079 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-01-30 21:54
It is not convenient to check the result for error. After we remove support of old PyUnicode API, PyUnicode_IsIdentifier() will be always succeeded.

Note that PyUnicode_IsIdentifier() still can crash if you pass a non-PyUnicode object or NULL. It is a prerequisite of this function that the argument must be a PyUnicode object. Add just yet one condition: it must be a ready PyUnicode object.
msg361816 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-11 13:29
New changeset f3e7ea5b8c220cd63101e419d529c8563f9c6115 by Victor Stinner in branch 'master':
bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
https://github.com/python/cpython/commit/f3e7ea5b8c220cd63101e419d529c8563f9c6115
msg362141 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-02-17 13:41
New changeset 3d235f5c5c5bce6e0caec44d2ce17f670c2ca2d7 by Hai Shi in branch 'master':
bpo-39500: Fix compile warnings in unicodeobject.c (GH-18519)
https://github.com/python/cpython/commit/3d235f5c5c5bce6e0caec44d2ce17f670c2ca2d7
History
Date User Action Args
2020-02-17 13:42:15vstinnerlinkissue39646 superseder
2020-02-17 13:41:22vstinnersetmessages: + msg362141
2020-02-11 13:29:53vstinnersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-02-11 13:29:37vstinnersetmessages: + msg361816
2020-02-07 08:38:40vstinnersetpull_requests: + pull_request17773
2020-01-30 21:54:31serhiy.storchakasetmessages: + msg361079
2020-01-30 12:07:45vstinnersetmessages: + msg361049
2020-01-30 12:04:52serhiy.storchakasetmessages: + msg361048
2020-01-30 11:51:47vstinnersetmessages: + msg361043
2020-01-30 10:13:55serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg361031
2020-01-30 09:26:06vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request17655
2020-01-30 09:22:35vstinnercreate