classification
Title: MatchObject __getitem__() should support slicing and len
Type: enhancement Stage: resolved
Components: Regular Expressions Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, ezio.melotti, mrabarnett, selik, serhiy.storchaka, xiang.zhang
Priority: normal Keywords:

Created on 2017-04-03 03:44 by selik, last changed 2017-04-03 18:45 by serhiy.storchaka. This issue is now closed.

Messages (8)
msg291050 - (view) Author: Michael Selik (selik) * Date: 2017-04-03 03:44
Currently, slicing a MatchObject causes an IndexError and len() a TypeError. It's natural to expect slicing and len to work on objects of a finite length that index by natural numbers.
msg291051 - (view) Author: Michael Selik (selik) * Date: 2017-04-03 03:47
This would also enable negative indexing, which currently raises "IndexError: no such group".

Edit: I meant whole numbers, not natural numbers.
msg291055 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2017-04-03 07:39
See also #24454.
msg291057 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-03 09:49
This has already been discussed in other issues. Adding support of indexing opened a can of worms.

len() for match objects is ambiguous because of the group 0. Implementing len() will make the match object iterable, but in a way incompatible with issue9529 (because of the group 0).

As for slicing and negative indexes, what is the use case? Do you know that you can get a tuple of groups by passing several arguments to group()? A regular expression usually has known set of groups, so you can just enumerate the indices (or better names) of needed groups (they can be not sequential).
msg291084 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2017-04-03 17:28
Short of a compelling use case, I suggest we reject this enhancement request. len() was deliberately not added in #24454. It's not like any normal code would be iterating over match groups.
msg291087 - (view) Author: Michael Selik (selik) * Date: 2017-04-03 17:58
Yesterday I wanted to do a destructuring bind on a slice of groups in a finditer. Similar situation to the use case of Issue #24454. It might not be "normal code" but I end up in that situation every month or so when parsing semi-structured documents. I found myself wishing for a mapping-destructuring bind, but that's another story.

I haven't read the full discussion of ``len`` on MatchObject yet, but I tentatively agree with Brandon Rhodes' comment in Issue #19536:

"My retort is that concentric groups can happen anyway:
that Group Zero, holding the entire match, is not really
as special as the newcomer might suspect, because you can
always wind up with groups inside of other groups; it is
simply part of the semantics of regular expressions that
groups might overlap or might contain one another ..."

@Serhiy, I was unaware of the feature of passing several arguments to groups. Unfortunately, the regex pattern I was using had a very large set of groups. A slice would have been particularly elegant. Passing several arguments to mo.groups() will be helpful, but still more awkward than a slice.

Perhaps it is a can of worms, but I was pleased to see indexing available and was disappointed not to find the typically supported corresponding features.
msg291088 - (view) Author: Michael Selik (selik) * Date: 2017-04-03 18:09
Sorry, it looks like I got the issue number wrong. My comparison should not have been with #24454, but instead with an issue I can't locate at the moment. Reproducing the example:

    for g0, g1, g2 in re.finditer(r'(\d+)/(\d+)', 'Is 1/3 the same as 2/6?'):
        ratio = Fraction(int(g1), int(g2))

Better:

    for mo in re.finditer(r'(\d+)/(\d+)', 'Is 1/3 the same as 2/6?'):
        ratio = Fraction(*map(int, mo[1:3]))

The map in the last one isn't very pretty, but I hope it illustrates the gist of what I'd like to do for a much larger pattern with many capture groups.
msg291090 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-03 18:45
You can use mo.group(1, 2). If you need to slice arbitrary groups, you can slice the result of the groups() method (which is just a tuple).

The re module is already complex, and let use existing API rather than add the new one.
History
Date User Action Args
2017-04-03 18:45:02serhiy.storchakasetstatus: open -> closed
resolution: rejected
messages: + msg291090

stage: resolved
2017-04-03 18:09:54seliksetmessages: + msg291088
2017-04-03 17:58:40seliksetmessages: + msg291087
2017-04-03 17:28:57eric.smithsetnosy: + eric.smith
messages: + msg291084
2017-04-03 09:49:45serhiy.storchakasetmessages: + msg291057
2017-04-03 07:39:55xiang.zhangsetnosy: + xiang.zhang, serhiy.storchaka
messages: + msg291055
2017-04-03 03:47:02seliksetmessages: + msg291051
versions: + Python 3.7
2017-04-03 03:44:36seliksetnosy: + ezio.melotti, mrabarnett
type: enhancement
components: + Regular Expressions
2017-04-03 03:44:12selikcreate