Issue 7940: re.finditer and re.findall should support negative end positions

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/52188

classification

Title:	re.finditer and re.findall should support negative end positions
Type:	enhancement	Stage:	patch review
Components:	Documentation	Versions:	Python 3.9

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:	docs@python	Nosy List:	anilbey, docs@python, ezio.melotti, mrabarnett, rhettinger, serhiy.storchaka, timehorse, tlynn, zach.ware
Priority:	low	Keywords:	patch

Created on 2010-02-16 14:04 by tlynn, last changed 2022-04-11 14:56 by admin.

Files
File name	Uploaded	Description	Edit
issue7940.patch	mrabarnett, 2013-05-26 00:30		review

Pull Requests
URL	Status	Linked	Edit
PR 14744	open	python-dev, 2019-07-13 13:38

Messages (15)
msg99411 - (view)	Author: Tom Lynn (tlynn)	Date: 2010-02-16 14:04
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> list(re.compile('x').finditer('xxxx', 1, 3)) [<_sre.SRE_Match object at 0x7f2e820f09f0>, <_sre.SRE_Match object at 0x7f2e820f0d98>] >>> list(re.compile('x').finditer('xxxx', 1, -1)) [] >>> re.compile('x').findall('xxxx', 1, -1) [] >>> re.compile('x').findall('xxxx', 1, 3) ['x', 'x']
msg99456 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2010-02-17 01:47
I discovered that my regex module (see issue #2636) suffers from the same problem. Oops! It was a simple fix (to be released).
msg189882 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2013-05-23 22:28
Has this been fixed in the regex module?
msg189885 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2013-05-23 22:44
Yes. As msg99456 suggests, I fixed it the my source code before posting. Compare re in Python 3.3.2: >>> re.compile('x').findall('xxxx', 1, 3) ['x', 'x'] >>> re.compile('x').findall('xxxx', 1, -1) [] with regex: >>> regex.compile('x').findall('xxxx', 1, 3) ['x', 'x'] >>> regex.compile('x').findall('xxxx', 1, -1) ['x', 'x']
msg190041 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2013-05-26 00:30
I've attached a patch.
msg190101 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2013-05-26 17:13
I'm worrying about backward compatibility. See also issue7951.
msg190121 - (view)	Author: Matthew Barnett (mrabarnett) *	Date: 2013-05-26 23:50
Like the OP, I would've expected it to handle negative indexes the way that strings do. In practice, I wouldn't normally provide negative indexes; I'd use some string or regex method to determine the search limits, and then pass them to finditer and findall, so they'd be non-negative anyway.
msg347852 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2019-07-13 18:37
-1 on the proposal. We don't know of any strong use cases, so there isn't a real problem being solved here. Rather than providing a benefit, this feature request makes it more likely that people will write convoluted code or that it will let bugs pass silently that would otherwise be caught. ISTM the actual issue here is an incorrect user expectation that "all things that having indexing will support negative indexing". While it is common for objects to implement negative index support, it is not universal or required. Even collections.abc.Sequence does not insist on negative index support. I think this warrants a FAQ entry (which should also mention that slice support as well is not universal or required, some objects have it, some don't). Reclassifying this as documentation issue.
msg347913 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2019-07-14 13:26
The current behavior is inconsistent because the start position accepts both positive and negative indices, whereas the end position only accepts positive indices. I think the proposal and the PR written by Anil are reasonable and should be merged.
msg347915 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2019-07-14 14:01
Sorry, I was wrong. re.findall accepts negative indices for both start and end but they silently get converted to 0, which is arguably an unexpected behavior. This is an example of the current behavior: >>> s, e = 1, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e] (['b', 'c', 'd'], 'bcd') >>> s, e = -4, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e] (['a', 'b', 'c', 'd'], 'bcd') >>> s, e = 1, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e] ([], 'bcd') >>> s, e = -4, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e] ([], 'bcd') With the patch, all these return ['b', 'c', 'd']. This change might indeed cause issues because it's a change in behavior, but I'm also not sure there are many cases where one would want a negative index to be treated as 0. Maybe we could raise a FutureWarning in the next release and change the behavior afterwards?
msg347918 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-07-14 14:21
Are there any real world examples which show the benefit of supporting negative indices?
msg347958 - (view)	Author: M. Anil Tuncel (anilbey) *	Date: 2019-07-15 12:09
I guess the use of negative indices serve the same purpose here as in lists or strings. Though as Ezio pointed out, the current behaviour is already accepting negative indices but providing inconsistent results in comparison to various other Python modules that support negative indices. In my opinion: If the negative indices are to be used here (which is what the current implementation suggests), they should behave in the same way as the rest of the Python modules. Otherwise, perhaps negative indices should not be allowed here at all. What do you think? P.S. this patch is already applied to the regex module by @mrabarnett
msg347997 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2019-07-16 00:36
> Are there any real world examples which show the benefit of supporting > negative indices? A common case is ignoring parentheses at the beginning/end, e.g. >>> re.compile('[^,]+').findall('(foo,123,(),bar)') ['(foo', '123', '()', 'bar)'] >>> # ignore the surrounding () >>> re.compile('[^,]+').findall('(foo,123,(),bar)', 1, 15) ['foo', '123', '()', 'bar'] >>> >>> # extract attributes from a tag (poc, doesn't handle all cases) >>> re.compile('[^ ]+').findall('<input type="checkbox" id="foo" checked>', 7, 39) ['type="checkbox"', 'id="foo"', 'checked'] In both cases using -1 as endpos is simpler.
msg351530 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2019-09-09 16:45
Ezio requested further opinions, so here's mine. I don't think the current behavior makes sense; I doubt anyone actually expects a negative index to be squashed to 0, especially for endpos. I'm not certain that allowing negative indexes is really necessary, but seems nicer than raising an exception which would be the other acceptable option to me.
msg351534 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2019-09-09 16:55
Note that changing the current behavior is a breaking change. For example someone can use `pattern.findall(text, curpos-50, curpos+50)` to search in the range ±50 characters from the current position. If negative positions change meaning, this will break a code for curpos < 50.

History
Date	User	Action	Args
2022-04-11 14:56:57	admin	set	github: 52188
2019-09-09 16:55:47	serhiy.storchaka	set	messages: + msg351534
2019-09-09 16:45:35	zach.ware	set	nosy: + zach.ware messages: + msg351530
2019-07-16 00:36:45	ezio.melotti	set	messages: + msg347997 versions: + Python 3.9, - Python 3.5
2019-07-15 12:09:03	anilbey	set	nosy: + anilbey messages: + msg347958
2019-07-14 14:21:50	serhiy.storchaka	set	messages: + msg347918
2019-07-14 14:01:58	ezio.melotti	set	messages: + msg347915
2019-07-14 13:26:18	ezio.melotti	set	messages: + msg347913
2019-07-13 18:37:32	rhettinger	set	nosy: + rhettinger, docs@python messages: + msg347852 assignee: docs@python components: + Documentation, - Library (Lib), Regular Expressions
2019-07-13 13:38:13	python-dev	set	pull_requests: + pull_request14540
2014-10-23 20:51:13	serhiy.storchaka	set	priority: normal -> low stage: needs patch -> patch review components: + Regular Expressions versions: + Python 3.5, - Python 3.4
2014-02-03 17:10:44	BreamoreBoy	set	nosy: - BreamoreBoy
2013-05-26 23:50:02	mrabarnett	set	messages: + msg190121
2013-05-26 17:13:02	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg190101
2013-05-26 00:30:10	mrabarnett	set	files: + issue7940.patch keywords: + patch messages: + msg190041
2013-05-25 11:43:05	pitrou	set	stage: test needed -> needs patch components: + Library (Lib) versions: + Python 3.4, - Python 2.7, Python 3.2
2013-05-23 22:44:02	mrabarnett	set	messages: + msg189885
2013-05-23 22:28:53	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg189882
2010-02-17 01:47:33	mrabarnett	set	messages: + msg99456
2010-02-17 00:05:34	ezio.melotti	set	priority: normal nosy: + timehorse, ezio.melotti, mrabarnett stage: test needed versions: + Python 2.7, Python 3.2, - Python 2.5
2010-02-16 14:04:33	tlynn	create