Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indices handling in PyUnicode_FindChar #73008

Closed
zhangyangyu opened this issue Nov 28, 2016 · 13 comments
Closed

Fix indices handling in PyUnicode_FindChar #73008

zhangyangyu opened this issue Nov 28, 2016 · 13 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@zhangyangyu
Copy link
Member

BPO 28822
Nosy @vstinner, @serhiy-storchaka, @zhangyangyu
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • PyUnicode_FindChar.patch
  • PyUnicode_FindChar-v2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-12-20.16:41:01.659>
    created_at = <Date 2016-11-28.16:29:18.863>
    labels = ['interpreter-core', 'type-bug', '3.7']
    title = 'Fix indices handling in PyUnicode_FindChar'
    updated_at = <Date 2017-03-31.16:36:12.130>
    user = 'https://github.com/zhangyangyu'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:12.130>
    actor = 'dstufft'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-12-20.16:41:01.659>
    closer = 'xiang.zhang'
    components = ['Interpreter Core']
    creation = <Date 2016-11-28.16:29:18.863>
    creator = 'xiang.zhang'
    dependencies = []
    files = ['45675', '45690']
    hgrepos = []
    issue_num = 28822
    keywords = ['patch']
    message_count = 13.0
    messages = ['281883', '281889', '281938', '281962', '281967', '281972', '282000', '282016', '282021', '283682', '283695', '283705', '286459']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'python-dev', 'serhiy.storchaka', 'xiang.zhang']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue28822'
    versions = ['Python 3.7']

    @zhangyangyu
    Copy link
    Member Author

    PyUnicode_FindChar declares in the doc it treats its *start* and *end* parameters as str[start:end], same as other APIs like PyUnicode_Find, PyUnicode_Count. But it doesn't allow negative indices like others so violates the doc.

    @zhangyangyu zhangyangyu added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 28, 2016
    @vstinner
    Copy link
    Member

    PyUnicode_FindChar.patch is a new feature, it cannot be applied to stable branches (py < 3.7).

    I'm not sure that it's worth it to support negative indexes for end. Why not simply documenting that end must be positive?

    @zhangyangyu
    Copy link
    Member Author

    Other APIs like PyUnicode_Find and PyUnicode_Count support it. Their docs are almost the same so I think PyUnicode_FindChar does not need to be the special one. After change, its behaviour and implementation are more consistent with other APIs.

    @vstinner
    Copy link
    Member

    Serhiy: I don't think that it's worth it to add a new function to _testcapi to test PyUnicode_FindChar. The implementation of the function seems simple.

    At least, I would prefer to only see a few unit tests, not 17 test for this simple function!

    I mean "character in str" is already tested by a *lot* of unit tests.

    @serhiy-storchaka
    Copy link
    Member

    I think it is nice to add tests for C API. Especially if there is no direct mapping between Python and C API ("character in str" don't call PyUnicode_FindChar()). Tests should cover all corner cases, otherwise we can miss bugs. Some C API can be not used in CPython at all, just in third-party extensions, and special tests is the only way to test them. The implementation of PyUnicode_FindChar() is not so simple (for example see bpo-24821).

    I don't have an opinion about supporting negative indices.

    @serhiy-storchaka
    Copy link
    Member

    Would be nice to test corner cases:

    1. Search UCS2 or UCS4 character with zero lower 8 bits: U+XX00.

    2. Search UCS2 or UCS4 character with lower 8 bits that match high bits of string characters. For example search U+0404 in the string that consists of U+04XX (Ukrainian text). I think you can find similar Chinese example.

    @zhangyangyu
    Copy link
    Member Author

    Thanks for your reviews. :-)

    v2 updated the test codes.

    @vstinner
    Copy link
    Member

    PyUnicode_FindChar-v2.patch LGTM with a minor comment on the review, but I would prefer that Serhiy also reviews it ;-)

    Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)

    @zhangyangyu
    Copy link
    Member Author

    Remaining question: what is the behaviour for direction=0, direction=100 or direction=-2? Maybe we can add a few unit tests for strange values of direction? (Not sure if it's worth it.)

    It's not documented so I also doubt it. Expect Serhiy's comment.

    @vstinner
    Copy link
    Member

    Ignore my request about special direction values. It's not worth it to writ tests for that.

    PyUnicode_FindChar-v2.patch LGTM.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 20, 2016

    New changeset ce6a6cc3765d by Xiang Zhang in branch 'default':
    Issue bpo-28822: Adjust indices handling of PyUnicode_FindChar().
    https://hg.python.org/cpython/rev/ce6a6cc3765d

    @zhangyangyu
    Copy link
    Member Author

    Thanks Victor and Serhiy!

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Jan 29, 2017

    New changeset 2b5e5a3a805e by Martin Panter in branch 'default':
    Issue bpo-28822: Add susp-ignored entry for NEWS; fix grammar
    https://hg.python.org/cpython/rev/2b5e5a3a805e

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants