This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: re.Scanner doesn't support more than 2 groups on regex
Type: behavior Stage: resolved
Components: Regular Expressions Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: angelonuffer, ezio.melotti, mrabarnett
Priority: normal Keywords:

Created on 2011-08-20 03:13 by angelonuffer, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (3)
msg142511 - (view) Author: Ângelo Otávio Nuffer Nunes (angelonuffer) Date: 2011-08-20 03:13
When I use the scanner object in re module, I can create groups on regex and associate this to a method...


In [17]: re.Scanner([(r"(\w)(\w)\w", foo)])
Out[17]: <re.Scanner instance at 0x15c4e60>


But I tryed 3 groups and it raises "invalid SRE code", but I think my regex is not wrong...


In [15]: scan = re.Scanner([(r"(\w)(\w)(\w)", foo)])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)

/home/angelo/<ipython console> in <module>()

/usr/lib/python2.7/re.pyc in __init__(self, lexicon, flags)
    305         s.groups = len(p)+1
    306         p = sre_parse.SubPattern(s, [(BRANCH, (None, p))])
--> 307         self.scanner = sre_compile.compile(p)
    308     def scan(self, string):
    309         result = []

/usr/lib/python2.7/sre_compile.pyc in compile(p, flags)
    520     return _sre.compile(
    521         pattern, flags | p.pattern.flags, code,
    522         p.pattern.groups-1,
--> 523         groupindex, indexgroup
    524         )

RuntimeError: invalid SRE code
msg142543 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2011-08-20 16:57
Even if this bug is fixed, it still won't work as you expect, and this s why.

The Scanner function accepts a list of 2-tuples. The first item of the tuple is a regex and the second is a function. For example:

    re.Scanner([(r"\d+", number), (r"\w+", word)])

The Scanner function then builds a regex, using the given regexes as alternatives, each wrapped as a capture group:

    r"(\d+)|(\w+)"

When matching, it sees which group captured and uses that to decide which function it should call, so, for example, if group 1 matched, it calls "number", and if group 2 matched, it calls "word".

When you introduce capture groups into the regexes, it gets confused. If your regex matches, it'll see that groups 1 and 2 match, so it'll try to call the second function, but there's isn't one...
msg142554 - (view) Author: Ângelo Otávio Nuffer Nunes (angelonuffer) Date: 2011-08-20 18:51
Ah, ok, thanks...
Then I think my idea is impossible.
I will use the Scanner in normal way. :)
History
Date User Action Args
2022-04-11 14:57:20adminsetgithub: 56998
2011-08-20 19:00:55ezio.melottisetresolution: not a bug
stage: resolved
2011-08-20 18:51:59angelonuffersetstatus: open -> closed

messages: + msg142554
2011-08-20 17:11:41Arfreversettitle: re.Scanner don't support more then 2 groups on regex -> re.Scanner doesn't support more than 2 groups on regex
2011-08-20 16:57:58mrabarnettsetmessages: + msg142543
2011-08-20 08:50:49ezio.melottisetnosy: + ezio.melotti, mrabarnett
2011-08-20 03:13:58angelonuffercreate