Author ezio.melotti
Date 2009-10-14.21:36:24
I'm skeptical about what you are proposing for the following reasons:
1) it doesn't exist in any other implementation that I know;
2) if implemented as default behavior:
   * it won't be backward-compatible;
   * it will increase the complexity;
3) it will be a proprietary extension and it will reduce the
compatibility with other implementations;
4) I can't think to any real word situation where this would be really

Using a flag like re.R to change the behavior might solve the issue 2),
but I'll explain why I don't think this is useful.

Let's take a simpler ipv4 address as example: you may want to use
'^(\d{1,3})(?:\.(\d{1,3})){3}$' to capture the digits (without checking
if they are in range(256)).
This currently only returns:
>>> re.match('^(\d{1,3})(?:\.(\d{1,3})){3}$', '').groups()
('192', '1')

If I understood correctly what you are proposing, you would like it to
return (['192'], ['168', '0', '1']) instead. This will also require an
additional step to join the two lists to get the list with the 4 values.

In these situations where some part is repeating, it's usually easier to
use re.findall() or re.split() (or just a plain str.split for simple
cases like this):
>>> addr = ''
>>> re.findall('(?:^|\.)(\d{1,3})', addr)
['192', '168', '0', '1']
>>> re.split('\.', addr) # no need to use re.split here
['192', '168', '0', '1']

In both the examples a single step is enough to get what you want
without changing the existing behavior.

'^(\d{1,3})(?:\.(\d{1,3})){3}$' can still be used to check if the string
has the right "format", before using the other methods to extract the data.

So I'm -1 about the whole idea and -0.8 about an additional flag.
Maybe you should discuss about this on the python-ideas ML.
