Message 361043 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	ezio.melotti, serhiy.storchaka, vstinner
Date	2020-01-30.11:51:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1580385107.46.0.40778776597.issue39500@roundup.psfhosted.org>
In-reply-to

Content
> Since the behavior was changed, I think we need a versionchanged directive. Done. > But I am not sure that it could be triggered in real code without using some broken extensions. _PyUnicode_Ready() can fail in two cases: the Py_UNICODE* string contains a character outside [U+0000; U+10ffff] character range, or a memory allocation failed. Both cases look very unlikely, knowning that _PyUnicode_Ready() is only called if PyUnicode_IsIdentifier() is called on a string using the Python legacy C API (Py_UNICODE). > It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails. Functions which don't use PyObject as return type have the tradition of returning -1. A few examples: * PyUnicode_GetLength() * PyUnicode_CopyCharacters() * PyUnicode_ReadChar() (return "(Py_UCS4)-1" on error) * PyLong_AsUnsignedLong() (return "(unsigned long)-1" on error) * etc. I don't see why PyUnicode_IsIdentifier() deserves to behave differently. > If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value. I updated my PR to document the behavior change in the "Changes in the C API" section of What's New in Python 3.9 documentation. Previously, the code crashed Python (Py_FatalError). So I'm confident that this case never occurred ever in the wild. We got zero bug report about that.

> Since the behavior was changed, I think we need a versionchanged directive.

Done.


> But I am not sure that it could be triggered in real code without using some broken extensions.

_PyUnicode_Ready() can fail in two cases: the Py_UNICODE* string contains a character outside [U+0000; U+10ffff] character range, or a memory allocation failed. Both cases look very unlikely, knowning that _PyUnicode_Ready() is only called if PyUnicode_IsIdentifier() is called on a string using the Python legacy C API (Py_UNICODE*).


> It may be safer to return 0 instead of returning -1 or crashing if PyUnicode_READY() fails.

Functions which don't use PyObject* as return type have the tradition of returning -1. A few examples:

* PyUnicode_GetLength()
* PyUnicode_CopyCharacters()
* PyUnicode_ReadChar() (return "(Py_UCS4)-1" on error)
* PyLong_AsUnsignedLong() (return "(unsigned long)-1" on error)
* etc.

I don't see why PyUnicode_IsIdentifier() deserves to behave differently.


> If return -1 and set an exception, it can lead to an unexpected behavior if the calling code expects only 0/1, and maybe even to crash in debug build due to set an exception and returning value.

I updated my PR to document the behavior change in the "Changes in the C API" section of What's New in Python 3.9 documentation.

Previously, the code crashed Python (Py_FatalError). So I'm confident that this case never occurred ever in the wild. We got zero bug report about that.

History
Date	User	Action	Args
2020-01-30 11:51:47	vstinner	set	recipients: + vstinner, ezio.melotti, serhiy.storchaka
2020-01-30 11:51:47	vstinner	set	messageid: <1580385107.46.0.40778776597.issue39500@roundup.psfhosted.org>
2020-01-30 11:51:47	vstinner	link	issue39500 messages
2020-01-30 11:51:47	vstinner	create