This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString
Type: behavior Stage: resolved
Components: Interpreter Core, Unicode Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: ezio.melotti, josh.r, methane, python-dev, serhiy.storchaka, skrah, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-11-15 19:50 by serhiy.storchaka, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
_PyUnicode_EqualToASCIIString.patch serhiy.storchaka, 2016-11-15 19:50 review
PyUnicode_CompareWithASCIIString.cocci serhiy.storchaka, 2016-11-15 19:51 Coccinella patch for replacing the function
_PyUnicode_EqualToASCII-runtime-check.diff serhiy.storchaka, 2016-11-16 16:10 review
Messages (16)
msg280882 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-15 19:50
Proposed patch replaces calls of public function PyUnicode_CompareWithASCIIString() with new private function _PyUnicode_EqualToASCIIString(). The problem with  PyUnicode_CompareWithASCIIString() is that it returns -1 for the result "less than" and error, but the error case is never checked. The patch is purposed for following purposes:

1. Readability. ``_PyUnicode_EqualToASCIIString(...)`` looks more readable than ``PyUnicode_CompareWithASCIIString(...) == 0`` or ``!PyUnicode_CompareWithASCIIString(...)``, especially in large expression. I always have to make an effort to understand correctly the meaning of the latter expression.

2. Efficiency. If the strings are not equal, _PyUnicode_EqualToASCIIString() can quickly return false, but PyUnicode_CompareWithASCIIString() needs to check whether what string is larger.

3. Correctness. Since no caller checks the error of PyUnicode_CompareWithASCIIString(), it is incorrectly interpreted as "less then". Exception set by PyUnicode_CompareWithASCIIString() can be leaked in following code causing mystical error or crash in debug build. There are too many callers to add error checking for them all. These would be non-trivial error-prone changes that add new lines of the code, new variables and new returns or gotos. On other hand replacing PyUnicode_CompareWithASCIIString() with _PyUnicode_EqualToASCIIString() is done by simple script.

_PyUnicode_EqualToASCIIString() returns true value (1) if strings are equal, false value (0) if they are different, and doesn't raise exceptions. Unlike to PyUnicode_CompareWithASCIIString() it works only with ASCII characters and returns false if any string contains non-ASCII characters.

The patch also documents the return value of PyUnicode_CompareWithASCIIString() in case of error.

See issue21449 for similar issue with _PyUnicode_CompareWithId().
msg280919 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-16 07:05
LGTM.
msg280925 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016-11-16 07:56
Patch LGTM.
But I don't know it's OK to commit it on 2.7, 3.5 and 3.6.
msg280926 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-16 08:20
New changeset 386c682dcd75 by Serhiy Storchaka in branch '3.5':
Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
https://hg.python.org/cpython/rev/386c682dcd75

New changeset 72d07d13869a by Serhiy Storchaka in branch '3.6':
Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
https://hg.python.org/cpython/rev/72d07d13869a

New changeset 6f0f77333da5 by Serhiy Storchaka in branch 'default':
Issue #28701: Replace PyUnicode_CompareWithASCIIString with _PyUnicode_EqualToASCIIString.
https://hg.python.org/cpython/rev/6f0f77333da5
msg280928 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 08:25
Thanks Xiang and Inada for your reviews.

The patch fixes a bug: error is not checked after PyUnicode_CompareWithASCIIString(). The patch is not applicable to 2.7 since PyUnicode_CompareWithASCIIString() is Python 3 only.
msg280934 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-16 10:55
(I reopen the issue to ask my question :-))

+/* Test whether a unicode is equal to ASCII string.  Return 1 if true,
+   0 otherwise.  Return 0 if any argument contains non-ASCII characters.
+   Any error occurs inside will be cleared before return. */

Can you please also document the behaviour if you pass two non-ASCII strings which are equal? I understand that it returns also 0, right?

Maybe the API should be more strict and require right to be ASCII: "right string must be encoded to ASCII". I expect an assertion error or a fatal error if right is non-ASCII when Python is compiled in debug mode.
msg280935 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 11:26
> Can you please also document the behaviour if you pass two non-ASCII strings which are equal?

What mean "equal"? The left argument is a Unicode string, but the right argument is a byte string. For comparing them we should decode right argument or encode left argument. The result depends on using encoding. _PyUnicode_EqualToASCIIString() uses ASCII (as shown from its name). Non-ASCII strings can't be equal. This is documented.

If the documentation is not clear, could you provide better wording?

> Maybe the API should be more strict and require right to be ASCII: "right string must be encoded to ASCII". I expect an assertion error or a fatal error if right is non-ASCII when Python is compiled in debug mode.

I hesitated about adding an assertion error or a fatal error in a bug fix. But this can be added in develop version.

I don't know what is better -- return 0 in all builds or return 0 in release build and crash in debug build?
msg280945 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-16 13:39
I suggest "return 0 in release build and crash in debug build".
msg280946 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-16 13:41
New changeset faf04a995031 by Serhiy Storchaka in branch '3.5':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/faf04a995031

New changeset ff3dacc98b3a by Serhiy Storchaka in branch '3.6':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/ff3dacc98b3a

New changeset 765013f71bc4 by Serhiy Storchaka in branch 'default':
Issue #28701: Replace _PyUnicode_CompareWithId with _PyUnicode_EqualToASCIIId.
https://hg.python.org/cpython/rev/765013f71bc4
msg280947 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 13:43
The correct issue for above commits is issue21449.
msg280955 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-16 14:13
New changeset b607f835f170 by Serhiy Storchaka in branch '3.5':
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701).
https://hg.python.org/cpython/rev/b607f835f170

New changeset 1369e51182b7 by Serhiy Storchaka in branch '3.6':
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701).
https://hg.python.org/cpython/rev/1369e51182b7

New changeset ba14f8b61bd8 by Serhiy Storchaka in branch 'default':
Fixed an off-by-one error in _PyUnicode_EqualToASCIIString (issue #28701).
https://hg.python.org/cpython/rev/ba14f8b61bd8
msg280964 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-16 16:10
Following patch adds checks in debug mode that the right argument of _PyUnicode_EqualToASCIIString and _PyUnicode_EqualToASCIIId is ASCII-only string.
msg280975 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-16 17:56
_PyUnicode_EqualToASCII-runtime-check.diff LGTM.
msg280976 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-11-16 18:03
New changeset 6dd22ed7140e by Serhiy Storchaka in branch '3.6':
Issue #28701: _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now
https://hg.python.org/cpython/rev/6dd22ed7140e

New changeset 44874b20e612 by Serhiy Storchaka in branch 'default':
Issue #28701: _PyUnicode_EqualToASCIIId and _PyUnicode_EqualToASCIIString now
https://hg.python.org/cpython/rev/44874b20e612
msg284944 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-01-07 23:30
For the record: This is all that happened in decimal if a) you
pass a legacy string and b) force _PyUnicode_Ready() to throw
a MemoryError:

>>> from decimal import *
>>> import _testcapi
>>> context = Context()
>>> traps = _testcapi.unicode_legacy_string('traps')
>>> getattr(context, traps)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError



Both a) and b) are not trivial to accomplish at all and the result
is completely benign.
msg285037 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2017-01-09 12:16
New changeset 35334a4d41aa by Stefan Krah in branch '3.5':
Issue #28701: Revert part of 5bdc8e1a50c8 for the following reasons:
https://hg.python.org/cpython/rev/35334a4d41aa
History
Date User Action Args
2022-04-11 14:58:39adminsetgithub: 72887
2017-01-09 12:16:31python-devsetmessages: + msg285037
2017-01-07 23:30:54skrahsetnosy: + skrah
messages: + msg284944
2016-11-16 18:04:01serhiy.storchakasetstatus: open -> closed
resolution: fixed
2016-11-16 18:03:24python-devsetmessages: + msg280976
2016-11-16 17:56:03vstinnersetmessages: + msg280975
2016-11-16 16:10:17serhiy.storchakasetfiles: + _PyUnicode_EqualToASCII-runtime-check.diff

messages: + msg280964
2016-11-16 14:13:31python-devsetmessages: + msg280955
2016-11-16 13:59:39serhiy.storchakaunlinkissue21449 dependencies
2016-11-16 13:43:10serhiy.storchakasetmessages: + msg280947
2016-11-16 13:41:49python-devsetmessages: + msg280946
2016-11-16 13:39:37vstinnersetmessages: + msg280945
2016-11-16 11:26:13serhiy.storchakasetmessages: + msg280935
2016-11-16 10:55:07vstinnersetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg280934
2016-11-16 08:25:43serhiy.storchakasetstatus: open -> closed
versions: - Python 2.7
messages: + msg280928

assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2016-11-16 08:20:20python-devsetnosy: + python-dev
messages: + msg280926
2016-11-16 07:56:23methanesetnosy: + methane
messages: + msg280925
2016-11-16 07:05:10xiang.zhangsetmessages: + msg280919
2016-11-15 19:53:34serhiy.storchakalinkissue21449 dependencies
2016-11-15 19:51:39serhiy.storchakasetfiles: + PyUnicode_CompareWithASCIIString.cocci
2016-11-15 19:50:05serhiy.storchakacreate