Title: re.Scanner groups
Created on 2020-04-12 07:52 by dchron, last changed 2020-04-12 14:06 by xtreak.

In the re module there is an experimental feature called Scanner.
Some unexpected behavior was found while working with it.
Here is an example:

>>> re.Scanner([('\w+=(\d+);', lambda s,g:]).scan('x=5;')
(['5;'], '')

The obvious error is the semicolon returned via capturing group 1.

Adding a dummy rule at the beginning, seems to solve that issue:

>>> re.Scanner([('z', None), ('\w+=(\d+);', lambda s,g:]).scan('x=5;')
(['5'], '')

Adding a capturing group around \w+ also returns the correct answer:

>>> re.Scanner([('z', None), ('(\w+)=(\d+);', lambda s,g:]).scan('x=5;')
(['x'], '')

But then, if I ask for the second group, the problem appears again:

>>> re.Scanner([('z', None), ('(\w+)=(\d+);', lambda s,g:]).scan('x=5;')
(['5;'], '')
