Fix indices handling in PyUnicode_FindChar #73008

zhangyangyu · 2016-11-28T16:29:19Z

BPO	28822
Nosy	@vstinner, @serhiy-storchaka, @zhangyangyu
PRs	[Do Not Merge] Convert `Misc/NEWS` so that it is managed by towncrier #552
Files	PyUnicode_FindChar.patch PyUnicode_FindChar-v2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-12-20.16:41:01.659>
created_at = <Date 2016-11-28.16:29:18.863>
labels = ['interpreter-core', 'type-bug', '3.7']
title = 'Fix indices handling in PyUnicode_FindChar'
updated_at = <Date 2017-03-31.16:36:12.130>
user = 'https://github.com/zhangyangyu'

bugs.python.org fields:

activity = <Date 2017-03-31.16:36:12.130>
actor = 'dstufft'
assignee = 'none'
closed = True
closed_date = <Date 2016-12-20.16:41:01.659>
closer = 'xiang.zhang'
components = ['Interpreter Core']
creation = <Date 2016-11-28.16:29:18.863>
creator = 'xiang.zhang'
dependencies = []
files = ['45675', '45690']
hgrepos = []
issue_num = 28822
keywords = ['patch']
message_count = 13.0
messages = ['281883', '281889', '281938', '281962', '281967', '281972', '282000', '282016', '282021', '283682', '283695', '283705', '286459']
nosy_count = 4.0
nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'xiang.zhang']
pr_nums = ['552']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue28822'
versions = ['Python 3.7']

The text was updated successfully, but these errors were encountered:

zhangyangyu · 2016-11-28T16:29:19Z

PyUnicode_FindChar declares in the doc it treats its *start* and *end* parameters as str[start:end], same as other APIs like PyUnicode_Find, PyUnicode_Count. But it doesn't allow negative indices like others so violates the doc.

vstinner · 2016-11-28T16:56:36Z

PyUnicode_FindChar.patch is a new feature, it cannot be applied to stable branches (py < 3.7).

I'm not sure that it's worth it to support negative indexes for end. Why not simply documenting that end must be positive?

zhangyangyu · 2016-11-29T03:09:52Z

Other APIs like PyUnicode_Find and PyUnicode_Count support it. Their docs are almost the same so I think PyUnicode_FindChar does not need to be the special one. After change, its behaviour and implementation are more consistent with other APIs.

vstinner · 2016-11-29T08:02:01Z

Serhiy: I don't think that it's worth it to add a new function to _testcapi to test PyUnicode_FindChar. The implementation of the function seems simple.

At least, I would prefer to only see a few unit tests, not 17 test for this simple function!

I mean "character in str" is already tested by a *lot* of unit tests.

serhiy-storchaka · 2016-11-29T08:20:24Z

I think it is nice to add tests for C API. Especially if there is no direct mapping between Python and C API ("character in str" don't call PyUnicode_FindChar()). Tests should cover all corner cases, otherwise we can miss bugs. Some C API can be not used in CPython at all, just in third-party extensions, and special tests is the only way to test them. The implementation of PyUnicode_FindChar() is not so simple (for example see bpo-24821).

I don't have an opinion about supporting negative indices.

serhiy-storchaka · 2016-11-29T08:38:26Z

Would be nice to test corner cases:

Search UCS2 or UCS4 character with zero lower 8 bits: U+XX00.
Search UCS2 or UCS4 character with lower 8 bits that match high bits of string characters. For example search U+0404 in the string that consists of U+04XX (Ukrainian text). I think you can find similar Chinese example.

zhangyangyu · 2016-11-29T15:51:29Z

Thanks for your reviews. :-)

v2 updated the test codes.

vstinner · 2016-11-29T17:15:27Z

PyUnicode_FindChar-v2.patch LGTM with a minor comment on the review, but I would prefer that Serhiy also reviews it ;-)

Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)

zhangyangyu · 2016-11-29T17:29:52Z

Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)

It's not documented so I also doubt it. Expect Serhiy's comment.

vstinner · 2016-12-20T11:25:47Z

Ignore my request about special direction values. It's not worth it to writ tests for that.

PyUnicode_FindChar-v2.patch LGTM.

python-dev · 2016-12-20T14:55:54Z

New changeset ce6a6cc3765d by Xiang Zhang in branch 'default':
Issue bpo-28822: Adjust indices handling of PyUnicode_FindChar().
https://hg.python.org/cpython/rev/ce6a6cc3765d

zhangyangyu · 2016-12-20T16:41:02Z

Thanks Victor and Serhiy!

python-dev · 2017-01-29T23:48:18Z

New changeset 2b5e5a3a805e by Martin Panter in branch 'default':
Issue bpo-28822: Add susp-ignored entry for NEWS; fix grammar
https://hg.python.org/cpython/rev/2b5e5a3a805e

zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 28, 2016

zhangyangyu closed this as completed Dec 20, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix indices handling in PyUnicode_FindChar #73008

Fix indices handling in PyUnicode_FindChar #73008

zhangyangyu commented Nov 28, 2016

zhangyangyu commented Nov 28, 2016

vstinner commented Nov 28, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Nov 29, 2016

serhiy-storchaka commented Nov 29, 2016

serhiy-storchaka commented Nov 29, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Nov 29, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Dec 20, 2016

python-dev mannequin commented Dec 20, 2016

zhangyangyu commented Dec 20, 2016

python-dev mannequin commented Jan 29, 2017

Fix indices handling in PyUnicode_FindChar #73008

Fix indices handling in PyUnicode_FindChar #73008

Comments

zhangyangyu commented Nov 28, 2016

zhangyangyu commented Nov 28, 2016

vstinner commented Nov 28, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Nov 29, 2016

serhiy-storchaka commented Nov 29, 2016

serhiy-storchaka commented Nov 29, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Nov 29, 2016

zhangyangyu commented Nov 29, 2016

vstinner commented Dec 20, 2016

python-dev mannequin commented Dec 20, 2016

zhangyangyu commented Dec 20, 2016

python-dev mannequin commented Jan 29, 2017