classification
Title: re.finditer and re.findall should support negative end positions
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: anilbey, docs@python, ezio.melotti, mrabarnett, rhettinger, serhiy.storchaka, timehorse, tlynn, zach.ware
Priority: low Keywords: patch

Created on 2010-02-16 14:04 by tlynn, last changed 2019-09-09 16:55 by serhiy.storchaka.

Files
File name Uploaded Description Edit
issue7940.patch mrabarnett, 2013-05-26 00:30 review
Pull Requests
URL Status Linked Edit
PR 14744 open python-dev, 2019-07-13 13:38
Messages (15)
msg99411 - (view) Author: Tom Lynn (tlynn) Date: 2010-02-16 14:04
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> list(re.compile('x').finditer('xxxx', 1, 3))
[<_sre.SRE_Match object at 0x7f2e820f09f0>, <_sre.SRE_Match object at 0x7f2e820f0d98>]
>>> list(re.compile('x').finditer('xxxx', 1, -1))
[]
>>> re.compile('x').findall('xxxx', 1, -1)
[]
>>> re.compile('x').findall('xxxx', 1, 3)
['x', 'x']
msg99456 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2010-02-17 01:47
I discovered that my regex module (see issue #2636) suffers from the same problem. Oops! It was a simple fix (to be released).
msg189882 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2013-05-23 22:28
Has this been fixed in the regex module?
msg189885 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-05-23 22:44
Yes. As msg99456 suggests, I fixed it the my source code before posting.

Compare re in Python 3.3.2:

>>> re.compile('x').findall('xxxx', 1, 3)
['x', 'x']
>>> re.compile('x').findall('xxxx', 1, -1)
[]

with regex:

>>> regex.compile('x').findall('xxxx', 1, 3)
['x', 'x']
>>> regex.compile('x').findall('xxxx', 1, -1)
['x', 'x']
msg190041 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-05-26 00:30
I've attached a patch.
msg190101 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-05-26 17:13
I'm worrying about backward compatibility. See also issue7951.
msg190121 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2013-05-26 23:50
Like the OP, I would've expected it to handle negative indexes the way that strings do.

In practice, I wouldn't normally provide negative indexes; I'd use some string or regex method to determine the search limits, and then pass them to finditer and findall, so they'd be non-negative anyway.
msg347852 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-07-13 18:37
-1 on the proposal.  We don't know of any strong use cases, so there isn't a real problem being solved here.  Rather than providing a benefit, this feature request makes it more likely that people will write convoluted code or that it will let bugs pass silently that would otherwise be caught.

ISTM the actual issue here is an incorrect user expectation that "all things that having indexing will support negative indexing".  While it is common for objects to implement negative index support, it is not universal or required.  Even collections.abc.Sequence does not insist on negative index support.

I think this warrants a FAQ entry (which should also mention that slice support as well is not universal or required, some objects have it, some don't).

Reclassifying this as documentation issue.
msg347913 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2019-07-14 13:26
The current behavior is inconsistent because the start position accepts both positive and negative indices, whereas the end position only accepts positive indices.
I think the proposal and the PR written by Anil are reasonable and should be merged.
msg347915 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2019-07-14 14:01
Sorry, I was wrong.  re.findall accepts negative indices for both start and end but they silently get converted to 0, which is arguably an unexpected behavior.

This is an example of the current behavior:
>>> s, e = 1, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
(['b', 'c', 'd'], 'bcd')
>>> s, e = -4, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
(['a', 'b', 'c', 'd'], 'bcd')
>>> s, e = 1, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
([], 'bcd')
>>> s, e = -4, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
([], 'bcd')

With the patch, all these return ['b', 'c', 'd'].  This change might indeed cause issues because it's a change in behavior, but I'm also not sure there are many cases where one would want a negative index to be treated as 0.  Maybe we could raise a FutureWarning in the next release and change the behavior afterwards?
msg347918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-07-14 14:21
Are there any real world examples which show the benefit of supporting negative indices?
msg347958 - (view) Author: M. Anil Tuncel (anilbey) * Date: 2019-07-15 12:09
I guess the use of negative indices serve the same purpose here as in lists or strings. 
Though as Ezio pointed out, the current behaviour is already accepting negative indices but providing inconsistent results in comparison to various other Python modules that support negative indices.

In my opinion:
If the negative indices are to be used here (which is what the current implementation suggests), they should behave in the same way as the rest of the Python modules. 
Otherwise, perhaps negative indices should not be allowed here at all. 

What do you think?

P.S. this patch is already applied to the regex module by @mrabarnett
msg347997 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2019-07-16 00:36
> Are there any real world examples which show the benefit of supporting 
> negative indices?

A common case is ignoring parentheses at the beginning/end, e.g.
>>> re.compile('[^,]+').findall('(foo,123,(),bar)')
['(foo', '123', '()', 'bar)']
>>> # ignore the surrounding ()
>>> re.compile('[^,]+').findall('(foo,123,(),bar)', 1, 15)
['foo', '123', '()', 'bar']
>>>
>>> # extract attributes from a tag (poc, doesn't handle all cases)
>>> re.compile('[^ ]+').findall('<input type="checkbox" id="foo" checked>', 7, 39)
['type="checkbox"', 'id="foo"', 'checked']

In both cases using -1 as endpos is simpler.
msg351530 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2019-09-09 16:45
Ezio requested further opinions, so here's mine.  I don't think the current behavior makes sense; I doubt anyone actually expects a negative index to be squashed to 0, especially for endpos.  I'm not certain that allowing negative indexes is really necessary, but seems nicer than raising an exception which would be the other acceptable option to me.
msg351534 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-09-09 16:55
Note that changing the current behavior is a breaking change. For example someone can use `pattern.findall(text, curpos-50, curpos+50)` to search in the range ±50 characters from the current position. If negative positions change meaning, this will break a code for curpos < 50.
History
Date User Action Args
2019-09-09 16:55:47serhiy.storchakasetmessages: + msg351534
2019-09-09 16:45:35zach.waresetnosy: + zach.ware
messages: + msg351530
2019-07-16 00:36:45ezio.melottisetmessages: + msg347997
versions: + Python 3.9, - Python 3.5
2019-07-15 12:09:03anilbeysetnosy: + anilbey
messages: + msg347958
2019-07-14 14:21:50serhiy.storchakasetmessages: + msg347918
2019-07-14 14:01:58ezio.melottisetmessages: + msg347915
2019-07-14 13:26:18ezio.melottisetmessages: + msg347913
2019-07-13 18:37:32rhettingersetnosy: + rhettinger, docs@python
messages: + msg347852

assignee: docs@python
components: + Documentation, - Library (Lib), Regular Expressions
2019-07-13 13:38:13python-devsetpull_requests: + pull_request14540
2014-10-23 20:51:13serhiy.storchakasetpriority: normal -> low
stage: needs patch -> patch review
components: + Regular Expressions
versions: + Python 3.5, - Python 3.4
2014-02-03 17:10:44BreamoreBoysetnosy: - BreamoreBoy
2013-05-26 23:50:02mrabarnettsetmessages: + msg190121
2013-05-26 17:13:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg190101
2013-05-26 00:30:10mrabarnettsetfiles: + issue7940.patch
keywords: + patch
messages: + msg190041
2013-05-25 11:43:05pitrousetstage: test needed -> needs patch
components: + Library (Lib)
versions: + Python 3.4, - Python 2.7, Python 3.2
2013-05-23 22:44:02mrabarnettsetmessages: + msg189885
2013-05-23 22:28:53BreamoreBoysetnosy: + BreamoreBoy
messages: + msg189882
2010-02-17 01:47:33mrabarnettsetmessages: + msg99456
2010-02-17 00:05:34ezio.melottisetpriority: normal
nosy: + timehorse, ezio.melotti, mrabarnett
stage: test needed

versions: + Python 2.7, Python 3.2, - Python 2.5
2010-02-16 14:04:33tlynncreate