Title: re.findall() documentation lacks information about finding THE LAST iteration of repeated capturing group (greedy)
Components: Documentation Versions: Python 3.4
Status: closed Resolution: not a bug
Messages (4)
msg226534 - (view) Author: Mateusz Dobrowolny (Mateusz.Dobrowolny) Date: 2014-09-07 12:35
Python 3.4.1, Windows.
help(re.findall) shows me:
findall(pattern, string, flags=0)
    Return a list of all non-overlapping matches in the string.

    If one or more capturing groups are present in the pattern, return
    a list of groups; this will be a list of tuples if the pattern
    has more than one group.

    Empty matches are included in the result.

It seems like there is missing information regarding greedy groups, i.e. (regular_expression)*
Please take a look at my example:

import re

text = 'To configure your editing environment, use the Editor settings page and its child pages. There is also a ' \
       'Quick Switch Scheme command that lets you change color schemes, themes, keymaps, etc. with a couple of ' \
print('Text to be searched: \n' + text)
print('\nSarching method: re.findall()')

regexp_result = re.findall(r'\w+(\s+\w+)', text)
print('\nRegexp rule: r\'\w+(\s+\w+)\' \nFound: ' + str(regexp_result))
print('This works as expected: findall() returns a list of groups (\s+\w+), and the groups are from non-overlapping matches.')

regexp_result = re.findall(r'\w+(\s+\w+)*', text)
print('\nHow about making the group greedy? Here we go: \nRegexp rule: r\'\w+(\s+\w+)*\' \nFound: ' + str(regexp_result))
print('This is a little bit unexpected for me: findall() returns THE LAST MATCHING group only, parsing from-left-to-righ.')

regexp_result_list = re.findall(r'(\w+(\s+\w+)*)', text)
first_group = list(i for i, j in regexp_result_list)
print('\nThe solution is to put an extra group aroung the whole RE: \nRegexp rule: r\'(\w+(\s+\w+)*)\' \nFound: ' + str(first_group))
print('So finally I can get all strings I am looking for, just like expected from the FINDALL method, by accessing first elements in tuples.')
----------END OF EXAMPLE-------------

I found the solution when practicing on this page:
TEST STRING: To configure your editing environment, use the Editor settings page and its child pages. There is also a Quick Switch Scheme command that lets you change color schemes, themes, keymaps, etc. with a couple of keystrokes.

it showed me on the right side with nice color-coding:
1st Capturing group (\s+\w+)*
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data

I think some information regarding repeated groups should be included as well in Python documentation.

BTW: I have one extra question.
Searching for 'findall' in this tracker I found this issue:

It looks like information about ordering information is no longer in 3.4.1 documentation. Shouldn't this be there?

Kind Regards
msg226543 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-09-07 20:30
Do you have a specific sentence or paragraph in mind that could be added?

Be aware help() just shows what's in the docstring, which is typically abbreviated.  The full docs are on  Can you find what you need there?
msg226567 - (view) Author: Mateusz Dobrowolny (Mateusz.Dobrowolny) Date: 2014-09-08 10:15
The official help
in fact contains more information, especially the one mentioned in

Regarding my issue - I am afraid it was my misunderstanding, because it looks like Regular Expressions return always LAST match and Python re.findall reutrns what it is said to be: the list of groups.
And since I repeat a captured group, I get only the last match.

More here for example here:

I was learning regexp yesterday, and first I reported this without knowing everytnig about capturing groups.

If returning the last match for repeting a capturing group is defined within RegEx itself, than there is no need to mention it in Python documentation...
msg226592 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2014-09-08 17:01
Then let's close this issue.
