classification
Title: Fix indices handling in PyUnicode_FindChar
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: python-dev, serhiy.storchaka, vstinner, xiang.zhang
Priority: normal Keywords: patch

Created on 2016-11-28 16:29 by xiang.zhang, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
PyUnicode_FindChar.patch xiang.zhang, 2016-11-28 16:29 review
PyUnicode_FindChar-v2.patch xiang.zhang, 2016-11-29 15:51 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (13)
msg281883 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-28 16:29
PyUnicode_FindChar declares in the doc it treats its *start* and *end* parameters as str[start:end], same as other APIs like PyUnicode_Find, PyUnicode_Count. But it doesn't allow negative indices like others so violates the doc.
msg281889 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-28 16:56
PyUnicode_FindChar.patch is a new feature, it cannot be applied to stable branches (py < 3.7).

I'm not sure that it's worth it to support negative indexes for end. Why not simply documenting that end must be positive?
msg281938 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-29 03:09
Other APIs like PyUnicode_Find and PyUnicode_Count support it. Their docs are almost the same so I think PyUnicode_FindChar does not need to be the special one. After change, its behaviour and implementation are more consistent with other APIs.
msg281962 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-29 08:02
Serhiy: I don't think that it's worth it to add a new function to _testcapi to test PyUnicode_FindChar. The implementation of the function seems simple.

At least, I would prefer to only see a few unit tests, not 17 test for this simple function!

I mean "character in str" is already tested by a *lot* of unit tests.
msg281967 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-29 08:20
I think it is nice to add tests for C API. Especially if there is no direct mapping between Python and C API ("character in str" don't call PyUnicode_FindChar()). Tests should cover all corner cases, otherwise we can miss bugs. Some C API can be not used in CPython at all, just in third-party extensions, and special tests is the only way to test them. The implementation of PyUnicode_FindChar() is not so simple (for example see issue24821).

I don't have an opinion about supporting negative indices.
msg281972 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-29 08:38
Would be nice to test corner cases:

1. Search UCS2 or UCS4 character with zero lower 8 bits: U+XX00.

2. Search UCS2 or UCS4 character with lower 8 bits that match high bits of string characters. For example search U+0404 in the string that consists of U+04XX (Ukrainian text). I think you can find similar Chinese example.
msg282000 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-29 15:51
Thanks for your reviews. :-)

v2 updated the test codes.
msg282016 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-11-29 17:15
PyUnicode_FindChar-v2.patch LGTM with a minor comment on the review, but I would prefer that Serhiy also reviews it ;-)

Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)
msg282021 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-11-29 17:29
> Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)

It's not documented so I also doubt it. Expect Serhiy's comment.
msg283682 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-12-20 11:25
Ignore my request about special direction values. It's not worth it to writ tests for that.

PyUnicode_FindChar-v2.patch LGTM.
msg283695 - (view) Author: Roundup Robot (python-dev) Date: 2016-12-20 14:55
New changeset ce6a6cc3765d by Xiang Zhang in branch 'default':
Issue #28822: Adjust indices handling of PyUnicode_FindChar().
https://hg.python.org/cpython/rev/ce6a6cc3765d
msg283705 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-12-20 16:41
Thanks Victor and Serhiy!
msg286459 - (view) Author: Roundup Robot (python-dev) Date: 2017-01-29 23:48
New changeset 2b5e5a3a805e by Martin Panter in branch 'default':
Issue #28822: Add susp-ignored entry for NEWS; fix grammar
https://hg.python.org/cpython/rev/2b5e5a3a805e
History
Date User Action Args
2017-03-31 16:36:12dstufftsetpull_requests: + pull_request875
2017-01-29 23:48:18python-devsetmessages: + msg286459
2016-12-20 16:41:01xiang.zhangsetstatus: open -> closed
resolution: fixed
messages: + msg283705

stage: patch review -> resolved
2016-12-20 14:55:53python-devsetnosy: + python-dev
messages: + msg283695
2016-12-20 11:25:47vstinnersetmessages: + msg283682
2016-11-29 17:29:52xiang.zhangsetmessages: + msg282021
2016-11-29 17:15:27vstinnersetmessages: + msg282016
2016-11-29 15:51:28xiang.zhangsetfiles: + PyUnicode_FindChar-v2.patch

messages: + msg282000
2016-11-29 08:38:26serhiy.storchakasetmessages: + msg281972
2016-11-29 08:20:24serhiy.storchakasetmessages: + msg281967
2016-11-29 08:02:01vstinnersetmessages: + msg281962
2016-11-29 03:09:51xiang.zhangsetmessages: + msg281938
versions: - Python 3.5, Python 3.6
2016-11-28 16:56:35vstinnersetmessages: + msg281889
2016-11-28 16:29:18xiang.zhangcreate