msg99411 - (view) |
Author: Tom Lynn (tlynn) |
Date: 2010-02-16 14:04 |
Python 2.5.2 (r252:60911, Jan 20 2010, 23:14:04)
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> list(re.compile('x').finditer('xxxx', 1, 3))
[<_sre.SRE_Match object at 0x7f2e820f09f0>, <_sre.SRE_Match object at 0x7f2e820f0d98>]
>>> list(re.compile('x').finditer('xxxx', 1, -1))
[]
>>> re.compile('x').findall('xxxx', 1, -1)
[]
>>> re.compile('x').findall('xxxx', 1, 3)
['x', 'x']
|
msg99456 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2010-02-17 01:47 |
I discovered that my regex module (see issue #2636) suffers from the same problem. Oops! It was a simple fix (to be released).
|
msg189882 - (view) |
Author: Mark Lawrence (BreamoreBoy) * |
Date: 2013-05-23 22:28 |
Has this been fixed in the regex module?
|
msg189885 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2013-05-23 22:44 |
Yes. As msg99456 suggests, I fixed it the my source code before posting.
Compare re in Python 3.3.2:
>>> re.compile('x').findall('xxxx', 1, 3)
['x', 'x']
>>> re.compile('x').findall('xxxx', 1, -1)
[]
with regex:
>>> regex.compile('x').findall('xxxx', 1, 3)
['x', 'x']
>>> regex.compile('x').findall('xxxx', 1, -1)
['x', 'x']
|
msg190041 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2013-05-26 00:30 |
I've attached a patch.
|
msg190101 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-05-26 17:13 |
I'm worrying about backward compatibility. See also issue7951.
|
msg190121 - (view) |
Author: Matthew Barnett (mrabarnett) *  |
Date: 2013-05-26 23:50 |
Like the OP, I would've expected it to handle negative indexes the way that strings do.
In practice, I wouldn't normally provide negative indexes; I'd use some string or regex method to determine the search limits, and then pass them to finditer and findall, so they'd be non-negative anyway.
|
msg347852 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2019-07-13 18:37 |
-1 on the proposal. We don't know of any strong use cases, so there isn't a real problem being solved here. Rather than providing a benefit, this feature request makes it more likely that people will write convoluted code or that it will let bugs pass silently that would otherwise be caught.
ISTM the actual issue here is an incorrect user expectation that "all things that having indexing will support negative indexing". While it is common for objects to implement negative index support, it is not universal or required. Even collections.abc.Sequence does not insist on negative index support.
I think this warrants a FAQ entry (which should also mention that slice support as well is not universal or required, some objects have it, some don't).
Reclassifying this as documentation issue.
|
msg347913 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2019-07-14 13:26 |
The current behavior is inconsistent because the start position accepts both positive and negative indices, whereas the end position only accepts positive indices.
I think the proposal and the PR written by Anil are reasonable and should be merged.
|
msg347915 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2019-07-14 14:01 |
Sorry, I was wrong. re.findall accepts negative indices for both start and end but they silently get converted to 0, which is arguably an unexpected behavior.
This is an example of the current behavior:
>>> s, e = 1, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
(['b', 'c', 'd'], 'bcd')
>>> s, e = -4, 4; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
(['a', 'b', 'c', 'd'], 'bcd')
>>> s, e = 1, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
([], 'bcd')
>>> s, e = -4, -1; re.compile('.').findall('abcde', s, e), 'abcde'[s:e]
([], 'bcd')
With the patch, all these return ['b', 'c', 'd']. This change might indeed cause issues because it's a change in behavior, but I'm also not sure there are many cases where one would want a negative index to be treated as 0. Maybe we could raise a FutureWarning in the next release and change the behavior afterwards?
|
msg347918 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2019-07-14 14:21 |
Are there any real world examples which show the benefit of supporting negative indices?
|
msg347958 - (view) |
Author: M. Anil Tuncel (anilbey) * |
Date: 2019-07-15 12:09 |
I guess the use of negative indices serve the same purpose here as in lists or strings.
Though as Ezio pointed out, the current behaviour is already accepting negative indices but providing inconsistent results in comparison to various other Python modules that support negative indices.
In my opinion:
If the negative indices are to be used here (which is what the current implementation suggests), they should behave in the same way as the rest of the Python modules.
Otherwise, perhaps negative indices should not be allowed here at all.
What do you think?
P.S. this patch is already applied to the regex module by @mrabarnett
|
msg347997 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2019-07-16 00:36 |
> Are there any real world examples which show the benefit of supporting
> negative indices?
A common case is ignoring parentheses at the beginning/end, e.g.
>>> re.compile('[^,]+').findall('(foo,123,(),bar)')
['(foo', '123', '()', 'bar)']
>>> # ignore the surrounding ()
>>> re.compile('[^,]+').findall('(foo,123,(),bar)', 1, 15)
['foo', '123', '()', 'bar']
>>>
>>> # extract attributes from a tag (poc, doesn't handle all cases)
>>> re.compile('[^ ]+').findall('<input type="checkbox" id="foo" checked>', 7, 39)
['type="checkbox"', 'id="foo"', 'checked']
In both cases using -1 as endpos is simpler.
|
msg351530 - (view) |
Author: Zachary Ware (zach.ware) *  |
Date: 2019-09-09 16:45 |
Ezio requested further opinions, so here's mine. I don't think the current behavior makes sense; I doubt anyone actually expects a negative index to be squashed to 0, especially for endpos. I'm not certain that allowing negative indexes is really necessary, but seems nicer than raising an exception which would be the other acceptable option to me.
|
msg351534 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2019-09-09 16:55 |
Note that changing the current behavior is a breaking change. For example someone can use `pattern.findall(text, curpos-50, curpos+50)` to search in the range ±50 characters from the current position. If negative positions change meaning, this will break a code for curpos < 50.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:57 | admin | set | github: 52188 |
2019-09-09 16:55:47 | serhiy.storchaka | set | messages:
+ msg351534 |
2019-09-09 16:45:35 | zach.ware | set | nosy:
+ zach.ware messages:
+ msg351530
|
2019-07-16 00:36:45 | ezio.melotti | set | messages:
+ msg347997 versions:
+ Python 3.9, - Python 3.5 |
2019-07-15 12:09:03 | anilbey | set | nosy:
+ anilbey messages:
+ msg347958
|
2019-07-14 14:21:50 | serhiy.storchaka | set | messages:
+ msg347918 |
2019-07-14 14:01:58 | ezio.melotti | set | messages:
+ msg347915 |
2019-07-14 13:26:18 | ezio.melotti | set | messages:
+ msg347913 |
2019-07-13 18:37:32 | rhettinger | set | nosy:
+ rhettinger, docs@python messages:
+ msg347852
assignee: docs@python components:
+ Documentation, - Library (Lib), Regular Expressions |
2019-07-13 13:38:13 | python-dev | set | pull_requests:
+ pull_request14540 |
2014-10-23 20:51:13 | serhiy.storchaka | set | priority: normal -> low stage: needs patch -> patch review components:
+ Regular Expressions versions:
+ Python 3.5, - Python 3.4 |
2014-02-03 17:10:44 | BreamoreBoy | set | nosy:
- BreamoreBoy
|
2013-05-26 23:50:02 | mrabarnett | set | messages:
+ msg190121 |
2013-05-26 17:13:02 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg190101
|
2013-05-26 00:30:10 | mrabarnett | set | files:
+ issue7940.patch keywords:
+ patch messages:
+ msg190041
|
2013-05-25 11:43:05 | pitrou | set | stage: test needed -> needs patch components:
+ Library (Lib) versions:
+ Python 3.4, - Python 2.7, Python 3.2 |
2013-05-23 22:44:02 | mrabarnett | set | messages:
+ msg189885 |
2013-05-23 22:28:53 | BreamoreBoy | set | nosy:
+ BreamoreBoy messages:
+ msg189882
|
2010-02-17 01:47:33 | mrabarnett | set | messages:
+ msg99456 |
2010-02-17 00:05:34 | ezio.melotti | set | priority: normal nosy:
+ timehorse, ezio.melotti, mrabarnett stage: test needed
versions:
+ Python 2.7, Python 3.2, - Python 2.5 |
2010-02-16 14:04:33 | tlynn | create | |