This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.findall have different match result against re.search or re.sub
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: HaujetZhao, ezio.melotti, mrabarnett, rhettinger, rondevous, serhiy.storchaka
Priority: normal Keywords:

Created on 2020-11-23 19:30 by HaujetZhao, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (5)
msg381689 - (view) Author: 赵豪杰 (HaujetZhao) * Date: 2020-11-23 19:30
```
>>> import re
>>> text = '121212 and 121212'
>>> pattern = '(12)+'
>>> print(re.findall(pattern, text))
['12', '12']
>>> 
>>> 
>>> print(re.search(pattern, text))
<re.Match object; span=(0, 6), match='121212'>
>>> 
>>> 
>>> print(re.sub(pattern, '', text))
 and 
# The re.findall have different search result against re.search or re.sub
# re.findall 和 re.search 、 re.sub 的匹配结果不相同
```

With same pattern and string, the re.findall is supposed to have same match with re.search, but it didn't. 

This result is from python3.8.5
msg381690 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-11-23 19:42
It looks correct to me. Of course, the result is different, because they are different functions. re.match() and re.search() return a match object (or None), and re.findall returns a list. What result did you expect?
msg381692 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-11-23 19:46
When groups are present in the regex, findall() returns the subgroups rather than the entire match:

    >>> mo = re.search('(12)+', '121212 and 121212')
    >>> mo[0]                  # Entire match
    '121212'
    >>> mo[1]                  # Group match
    '12'

To get the result you were looking for use a non-capturing expression:

    >>> re.findall('(?:12)+', '121212 and 121212')
    ['121212', '121212']

Also consider using finditer() which gives more fine grained control:

    >>> for mo in re.finditer('(12)+', '121212 and 121212'):
            print(mo.span())
            print(mo[0])
            print(mo[1])
            print()
        
    (0, 6)
    121212
    12

    (11, 17)
    121212
    12
msg381700 - (view) Author: 赵豪杰 (HaujetZhao) * Date: 2020-11-23 23:59
AhAh, got it, I misunderstood the usage, the findall returns tuple of groups the expression set. Thanks @serhiy.storchaka @rhettinger
msg399793 - (view) Author: Rondevous (rondevous) Date: 2021-08-17 20:59
I was frustrated for hours when I couldn't figure out why this won't match:

>>> re.findall(r'(foo)?bar|cool', 'cool')

Now I know, I have to make this change: (?:foo)
But this isn't obvious.
Should it be mentioned in the docs of re.findall() to use (?:...) for non-capturing groups?
History
Date User Action Args
2022-04-11 14:59:38adminsetgithub: 86614
2021-08-17 20:59:28rondevoussetnosy: + rondevous
messages: + msg399793
2020-11-23 23:59:54HaujetZhaosetstatus: open -> closed
resolution: not a bug
messages: + msg381700

stage: resolved
2020-11-23 19:46:36rhettingersetnosy: + rhettinger
messages: + msg381692
2020-11-23 19:42:07serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg381690
2020-11-23 19:30:57HaujetZhaocreate