classification
Title: re fails to identify invalid numeric group references in replacement strings
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: SilentGhost, bazwal, ezio.melotti, mrabarnett, python-dev, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-12-25 19:46 by bazwal, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
issue25953.diff SilentGhost, 2015-12-25 20:53 review
25953_2.diff SilentGhost, 2016-10-16 09:13 review
25953_3.diff SilentGhost, 2016-10-20 19:32 review
25953_4.diff SilentGhost, 2016-10-22 11:38 review
25953_5.diff SilentGhost, 2016-10-23 07:47 review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (10)
msg257008 - (view) Author: bazwal (bazwal) Date: 2015-12-25 19:46
This code example:

    re.sub(r'(?P<x>[123])', r'\g<a>', '')

will correctly raise a KeyError due to the invalid group reference.

However, this very similar code example:

    re.sub(r'(?P<x>[123])', r'\g<3>', '')

fails to raise an error. It seems that the only way to check whether a numeric group reference is compatible with a given pattern is to test it against a string which happens to match. But this is obviously infeasible when checking unknown expressions (e.g. those taken from user input). And in any case: errors should be raised at the point where they occur (i.e. during compilation), not at some indeterminate point in the future.

Regular expression objects have a "groups" attribute which holds the number of capturing groups in the pattern. So there seems no good reason why the replacement string parser can't identify invalid numeric group references in exactly the same way that it does for symbolic ones.
msg257013 - (view) Author: SilentGhost (SilentGhost) * Date: 2015-12-25 20:53
Well, at least on the surface of it, the fix seems pretty straightforward:
check for the group index. With this patch the behaviour is as expected, but I get two tests erroring out since they're expecting differently worded error. This probably needs adjustments to those tests as well as C code written.
msg278755 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-10-16 09:13
Here is the updated patch with fixes to the test suite.
msg278776 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-16 19:22
Needed new tests for changed behavior. Test re.sub() with incorrect groups and the string that doesn't match the pattern (e.g. empty string).

Added other comments on Rietveld.
msg279068 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-10-20 19:32
Updated patch taking Serhiy's comments into account.

There was another case on line 725 for when zero is used as a group number, I'm not sure though if it falls within the scope of this issue, so it's not included in the current patch.
msg279189 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-10-22 11:38
I've modified addgroup to take a pos argument, this seem to introduce minimal disturbance.
msg279241 - (view) Author: SilentGhost (SilentGhost) * Date: 2016-10-23 07:47
Updated patch fixing the position issue.
msg279242 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-23 08:18
LGTM. Thank you for your contribution SilentGhost.
msg279243 - (view) Author: Roundup Robot (python-dev) Date: 2016-10-23 09:12
New changeset cea983246919 by Serhiy Storchaka in branch '3.6':
Issue #25953: re.sub() now raises an error for invalid numerical group
https://hg.python.org/cpython/rev/cea983246919

New changeset 15e3695affa2 by Serhiy Storchaka in branch 'default':
Issue #25953: re.sub() now raises an error for invalid numerical group
https://hg.python.org/cpython/rev/15e3695affa2
msg279248 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-23 09:52
Committed with additional changes. Fixed yet one occurrence of "invalid group reference" without group index, and made small style changes.
History
Date User Action Args
2017-03-31 16:36:13dstufftsetpull_requests: + pull_request892
2016-10-23 09:52:45serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg279248

stage: commit review -> resolved
2016-10-23 09:12:34python-devsetnosy: + python-dev
messages: + msg279243
2016-10-23 08:18:24serhiy.storchakasetmessages: + msg279242
stage: patch review -> commit review
2016-10-23 07:47:53SilentGhostsetfiles: + 25953_5.diff

messages: + msg279241
2016-10-22 11:38:05SilentGhostsetfiles: + 25953_4.diff

messages: + msg279189
2016-10-20 19:32:27SilentGhostsetfiles: + 25953_3.diff

messages: + msg279068
versions: - Python 3.5
2016-10-16 19:22:15serhiy.storchakasetmessages: + msg278776
2016-10-16 09:13:04SilentGhostsetfiles: + 25953_2.diff

messages: + msg278755
versions: + Python 3.7
2016-10-16 08:35:54serhiy.storchakasetassignee: serhiy.storchaka

nosy: + serhiy.storchaka
stage: patch review
2015-12-25 20:53:11SilentGhostsetfiles: + issue25953.diff
versions: + Python 3.6
nosy: + SilentGhost

messages: + msg257013

keywords: + patch
2015-12-25 19:46:48bazwalcreate