This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author brandon-rhodes
Recipients brandon-rhodes, ezio.melotti, mrabarnett
Date 2013-11-09.15:15:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1384010127.61.0.754728433968.issue19536@psf.upfronthosting.co.za>
In-reply-to
Content
Regular expression re.MatchObject objects are sequences.
They contain at least one “group” string, possibly more,
which are integer-indexed starting at zero.
Today, groups can be accessed in one of two ways.

(1) You can call the method match.group(N).

(2) You can call glist = match.groups()
    and then access each group as glist[N-1].
    Note the obvious off-by-one error:
    .groups() does not include “group zero”,
    which contains the entire match,
    and therefore its indexes are off-by-one
    from the values you would pass to .group().

I propose that MatchObject gain a __getitem__(N) method
whose return value for every N is the same as .group(N)
as I think that match[N] is a quite obvious syntax for
asking for one particular group of an RE match.

The only objection I can see to this proposal
is the obvious asymmetry between Group Zero and all
subsequent groups of a regular expression pattern:
zero means “the whole thing” whereas each of the others
holds the content of a particular explicit set of parens.
Looping over the elements match[0], match[1], ... of a
pattern like this:

    r'(\d\d\d\d)/(\d\d)/(\d\d)'

will give you *first* the *entire* match, and only then
turn its attention to the three parenthesized substrings.

My retort is that concentric groups can happen anyway:
that Group Zero, holding the entire match, is not really
as special as the newcomer might suspect, because you can
always wind up with groups inside of other groups; it is
simply part of the semantics of regular expressions that
groups might overlap or might contain one another, as in:

    r'((\d\d)/(\d\d)) Description: (.*)'

Here, we see that concentricity is not a special property
of Group Zero, but in fact something that can happen quite
naturally with other groups.

The caller simply needs to imagine every regular expression
being surrounded by an “automatic set of parentheses” to
understand where Group Zero comes from, and how it will be
ordered in the resulting sequence of groups relative to
the subordinate groups within the string.

If one or two people voice agreement here in this issue,
I will be very happy to offer a patch.
History
Date User Action Args
2013-11-09 15:15:27brandon-rhodessetrecipients: + brandon-rhodes, ezio.melotti, mrabarnett
2013-11-09 15:15:27brandon-rhodessetmessageid: <1384010127.61.0.754728433968.issue19536@psf.upfronthosting.co.za>
2013-11-09 15:15:27brandon-rhodeslinkissue19536 messages
2013-11-09 15:15:26brandon-rhodescreate