classification
Title: Converge re.findall and re.finditer
Type: behavior Stage: needs patch
Components: Regular Expressions Versions: Python 3.2
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: MizardX, ezio.melotti, moreati, mrabarnett, timehorse
Priority: normal Keywords:

Created on 2010-08-06 05:41 by MizardX, last changed 2010-08-07 18:45 by mrabarnett.

Messages (4)
msg113074 - (view) Author: MizardX (MizardX) Date: 2010-08-06 05:41
re.findall and re.finditer has very different signature. One iterates over match objects, the other returns a list of tuples.

I can think of two ways to make them more similar:

1) Make match objects iterable over their captures. With this, you could write something like the following:

for key,value in re.finditer(r'(\w+):(\w+)', text):
  data[key] = value

2) Make re.findall return an iterator over tuples. This would decrease the memory footprint.
msg113121 - (view) Author: Matthew Barnett (mrabarnett) Date: 2010-08-06 17:46
(1) would break existing code. It would also mean that you wouldn't have access to the start and end positions of the matches either.

(2) would also break existing code which is expecting a list. It's like the change that happened when some methods which return a list in Python 2 return a generator in Python 3. I think it's too late now because we're already at Python 3.1. If you want to reduce the memory footprint then you can still do:

items = (m.groups() for m in re.finditer(r'(\w+):(\w+)', text))
for key,value in items:
    data[key] = value
msg113170 - (view) Author: MizardX (MizardX) Date: 2010-08-07 12:28
I don't think (1) would break any code. finditer() would still generate match-objects.

The only time you would be discard the match-object, is if you try to do a destructuring bind in, e.g. a loop. This shouldn't be unexpected for the programmer.
msg113189 - (view) Author: Matthew Barnett (mrabarnett) Date: 2010-08-07 18:45
Ah, I see what you mean. I still think you're wrong, though! :-)

The 'for' loop is doing is basically this:

    it = re.finditer(r'(\w+):(\w+)', text)
    try:
        while True:
            match_object = next(it)
            # body of loop
    except StopIteration:
        pass

re.finditer() it returns a generator which yields match objects.

What I think you're actually requesting (but not realising) is for the 'for' loop not just to iterate over the generator, but also over what the generator yields.

If you want re.finditer() to yield the groups then it has to return a generator which yields those groups, not match objects.
History
Date User Action Args
2010-08-07 18:45:07mrabarnettsetmessages: + msg113189
2010-08-07 12:28:55MizardXsetmessages: + msg113170
2010-08-06 17:46:38mrabarnettsetmessages: + msg113121
2010-08-06 05:52:35ezio.melottisetnosy: + timehorse, ezio.melotti, mrabarnett, moreati
stage: needs patch
type: behavior

versions: + Python 3.2
2010-08-06 05:41:03MizardXcreate