This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Clarify the documentation of re.findall()
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, miss-islington, rondevous, serhiy.storchaka, veky
Priority: normal Keywords: patch

Created on 2021-08-17 22:24 by rondevous, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 27849 merged serhiy.storchaka, 2021-08-20 08:22
PR 27879 merged miss-islington, 2021-08-22 07:24
PR 27880 merged miss-islington, 2021-08-22 07:24
Messages (13)
msg399799 - (view) Author: Rondevous (rondevous) Date: 2021-08-17 22:24
Can it please be hinted in the docs of re.findall to use (?:...) for non-capturing groups?

>>> re.findall('(foo)?bar|cool', 'cool')
['']
>>>
### I expected the result: ['cool']

After hours of frustration, I learnt that I should use a non-capturing group (?:foo) in the pattern. This was not obvious.


P.S. Making the groups non-capturing in such a pattern is not needed in javascript (as tested on regexr.com); could this be an issue with the | operator in re.findall?
msg399907 - (view) Author: Vedran Čačić (veky) * Date: 2021-08-19 11:22
It currently says:

...matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups...

I'm not quite sure how it could be clearer. Maybe "Alternatively" at the start of the second sentence?

regexr does the same thing, as far as I can see. Match is 'cool', group 1 is empty. Matches are not the same as groups.
msg399908 - (view) Author: Vedran Čačić (veky) * Date: 2021-08-19 11:28
Ah, now I see. When some_match.group(0) is called, the whole match is returned. So match can be considered kinda group (quasigroup?:). I see how it can be confusing: python usually starts indexing at 0, and someone might think that a .group(0) would be included in "a list of groups" returned.

I'm not sure how best to fix it. Maybe: Alternatively, if grouping parentheses are present in the pattern, return a list of groups captured by them...
msg399976 - (view) Author: Rondevous (rondevous) Date: 2021-08-20 16:01
To clarify in short: the pattern I mentioned doesn't give the result I expected in re.findall() unlike re.search()

Given pattern:  (foo)?bar|cool

Maybe my approach in testing the regex first using re.search() and then using re.findall() to return all matches was wrong.

Initially, after going through help(re) I had associated re.findall with the 'global' flag used in javascript regex which would return all the matches. Without the global flag (in javascript) only the first match is returned, like re.search() in python.
msg399977 - (view) Author: Rondevous (rondevous) Date: 2021-08-20 16:02
From my understanding, "|" should match either the RegEx on the left or the RegEx on the right of the pipe
>>> help(re):
        "|"      A|B, creates an RE that will match either A or B.

With re.search(), the pattern below matches 'cool' as well as 'foo'
>>> re.search('(foo)|cool?', 'foo bar cool foobar coolbar')
<re.Match object; span=(0, 3), match='foo'>
>>> re.search('(foo)|cool?', 'cool')
<re.Match object; span=(0, 4), match='cool'>

But, the same pattern and strings won't match 'cool' if used with re.findall() or re.finditer() because of how they work when capture-groups are present in the pattern.
msg399978 - (view) Author: Rondevous (rondevous) Date: 2021-08-20 16:02
To produce the same results that you'd get by using the global flag in javascript regex, and make re.findall to not capture the groups exclusively, all the groups in the pattern need to be of the non-capturing (?:) type. 

If the distinction about capturing and non-capturing groups is mentioned in the docs of re.findall, it would help those who have learnt regex from another language (like javascript), where the global flag in regex is allowed.

I want the docs of re.findall and re.finditer to somehow suggest the use (?:group) to return the original matches and not the captured groups.
msg399979 - (view) Author: Rondevous (rondevous) Date: 2021-08-20 16:03
Maybe the functionality of re.findall and re.finditer is limited because, e.g. I can't do something like this:
https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions#3513858

The workaround for doing that might need me to eventually write a parser O_O
msg399980 - (view) Author: Vedran Čačić (veky) * Date: 2021-08-20 16:59
Have you seen the patch? In the patched docs, non-capturing grouping is explicitly mentioned. (Though I myself wouldn't include even that, as it's superfluous with what's said before, obviously it's needed.:)
msg399981 - (view) Author: Vedran Čačić (veky) * Date: 2021-08-20 17:01
Also, maybe you should read the following sentence (also in the docs):

> If one wants more information about all matches of a pattern than the matched text, finditer() is useful as it provides match objects instead of strings.

It seems that's what you wanted in the first place.
msg400050 - (view) Author: Rondevous (rondevous) Date: 2021-08-22 04:42
Oops, I was wrong about re.finditer :D
Sorry, I think didn't check that properly.

Just saw the changes. The patch looks good :)

Thanks a lot!
msg400052 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-08-22 07:24
New changeset 64f9e7b19dc1603fcbd07c17c9860085b9d21465 by Serhiy Storchaka in branch 'main':
bpo-44940: Clarify the documentation of re.findall() (GH-27849)
https://github.com/python/cpython/commit/64f9e7b19dc1603fcbd07c17c9860085b9d21465
msg400054 - (view) Author: miss-islington (miss-islington) Date: 2021-08-22 07:45
New changeset 519bcc698c436e12bd6c1ff6f2517060719c60d5 by Miss Islington (bot) in branch '3.10':
bpo-44940: Clarify the documentation of re.findall() (GH-27849)
https://github.com/python/cpython/commit/519bcc698c436e12bd6c1ff6f2517060719c60d5
msg400085 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-08-22 18:15
New changeset d006392245c904547e5727144235c2f9d7948e96 by Miss Islington (bot) in branch '3.9':
bpo-44940: Clarify the documentation of re.findall() (GH-27849) (GH-27880)
https://github.com/python/cpython/commit/d006392245c904547e5727144235c2f9d7948e96
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89103
2021-08-22 18:16:13serhiy.storchakasetstatus: open -> closed
stage: patch review -> resolved
resolution: fixed
versions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.8
2021-08-22 18:15:41serhiy.storchakasetmessages: + msg400085
2021-08-22 07:45:13miss-islingtonsetmessages: + msg400054
2021-08-22 07:24:46miss-islingtonsetpull_requests: + pull_request26335
2021-08-22 07:24:42miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request26334
2021-08-22 07:24:30serhiy.storchakasetmessages: + msg400052
2021-08-22 04:42:50rondevoussetmessages: + msg400050
title: Suggest the use of non-capturing groups in re.findall() and re.finditer() docs -> Clarify the documentation of re.findall()
2021-08-20 17:01:46vekysetmessages: + msg399981
2021-08-20 16:59:21vekysetmessages: + msg399980
2021-08-20 16:03:57rondevoussetmessages: + msg399979
2021-08-20 16:02:49rondevoussetmessages: + msg399978
2021-08-20 16:02:18rondevoussetmessages: + msg399977
title: Hint the use of non-capturing group in re.findall() documentation -> Suggest the use of non-capturing groups in re.findall() and re.finditer() docs
2021-08-20 16:01:16rondevoussetmessages: + msg399976
2021-08-20 08:22:46serhiy.storchakasetkeywords: + patch
nosy: + serhiy.storchaka

pull_requests: + pull_request26309
stage: patch review
2021-08-19 11:28:18vekysetmessages: + msg399908
2021-08-19 11:22:07vekysetnosy: + veky
messages: + msg399907
2021-08-17 22:24:06rondevouscreate