Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re fails to identify invalid numeric group references in replacement strings #70141

Closed
bazwal mannequin opened this issue Dec 25, 2015 · 10 comments
Closed

re fails to identify invalid numeric group references in replacement strings #70141

bazwal mannequin opened this issue Dec 25, 2015 · 10 comments
Assignees
Labels
3.7 (EOL) end of life topic-regex type-bug An unexpected behavior, bug, or error

Comments

@bazwal
Copy link
Mannequin

bazwal mannequin commented Dec 25, 2015

BPO 25953
Nosy @ezio-melotti, @serhiy-storchaka
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • issue25953.diff
  • 25953_2.diff
  • 25953_3.diff
  • 25953_4.diff
  • 25953_5.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-10-23.09:52:45.135>
    created_at = <Date 2015-12-25.19:46:48.041>
    labels = ['expert-regex', 'type-bug', '3.7']
    title = 're fails to identify invalid numeric group references in replacement strings'
    updated_at = <Date 2017-03-31.16:36:13.857>
    user = 'https://bugs.python.org/bazwal'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:13.857>
    actor = 'dstufft'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-10-23.09:52:45.135>
    closer = 'serhiy.storchaka'
    components = ['Regular Expressions']
    creation = <Date 2015-12-25.19:46:48.041>
    creator = 'bazwal'
    dependencies = []
    files = ['41422', '45113', '45158', '45187', '45194']
    hgrepos = []
    issue_num = 25953
    keywords = ['patch']
    message_count = 10.0
    messages = ['257008', '257013', '278755', '278776', '279068', '279189', '279241', '279242', '279243', '279248']
    nosy_count = 6.0
    nosy_names = ['bazwal', 'ezio.melotti', 'mrabarnett', 'SilentGhost', 'python-dev', 'serhiy.storchaka']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25953'
    versions = ['Python 3.6', 'Python 3.7']

    @bazwal
    Copy link
    Mannequin Author

    bazwal mannequin commented Dec 25, 2015

    This code example:

    re.sub(r'(?P<x>[123])', r'\g<a>', '')
    

    will correctly raise a KeyError due to the invalid group reference.

    However, this very similar code example:

    re.sub(r'(?P<x>[123])', r'\g<3>', '')
    

    fails to raise an error. It seems that the only way to check whether a numeric group reference is compatible with a given pattern is to test it against a string which happens to match. But this is obviously infeasible when checking unknown expressions (e.g. those taken from user input). And in any case: errors should be raised at the point where they occur (i.e. during compilation), not at some indeterminate point in the future.

    Regular expression objects have a "groups" attribute which holds the number of capturing groups in the pattern. So there seems no good reason why the replacement string parser can't identify invalid numeric group references in exactly the same way that it does for symbolic ones.

    @bazwal bazwal mannequin added topic-regex type-bug An unexpected behavior, bug, or error labels Dec 25, 2015
    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Dec 25, 2015

    Well, at least on the surface of it, the fix seems pretty straightforward:
    check for the group index. With this patch the behaviour is as expected, but I get two tests erroring out since they're expecting differently worded error. This probably needs adjustments to those tests as well as C code written.

    @serhiy-storchaka serhiy-storchaka self-assigned this Oct 16, 2016
    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Oct 16, 2016

    Here is the updated patch with fixes to the test suite.

    @SilentGhost SilentGhost mannequin added the 3.7 (EOL) end of life label Oct 16, 2016
    @serhiy-storchaka
    Copy link
    Member

    Needed new tests for changed behavior. Test re.sub() with incorrect groups and the string that doesn't match the pattern (e.g. empty string).

    Added other comments on Rietveld.

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Oct 20, 2016

    Updated patch taking Serhiy's comments into account.

    There was another case on line 725 for when zero is used as a group number, I'm not sure though if it falls within the scope of this issue, so it's not included in the current patch.

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Oct 22, 2016

    I've modified addgroup to take a pos argument, this seem to introduce minimal disturbance.

    @SilentGhost
    Copy link
    Mannequin

    SilentGhost mannequin commented Oct 23, 2016

    Updated patch fixing the position issue.

    @serhiy-storchaka
    Copy link
    Member

    LGTM. Thank you for your contribution SilentGhost.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Oct 23, 2016

    New changeset cea983246919 by Serhiy Storchaka in branch '3.6':
    Issue bpo-25953: re.sub() now raises an error for invalid numerical group
    https://hg.python.org/cpython/rev/cea983246919

    New changeset 15e3695affa2 by Serhiy Storchaka in branch 'default':
    Issue bpo-25953: re.sub() now raises an error for invalid numerical group
    https://hg.python.org/cpython/rev/15e3695affa2

    @serhiy-storchaka
    Copy link
    Member

    Committed with additional changes. Fixed yet one occurrence of "invalid group reference" without group index, and made small style changes.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life topic-regex type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant