Author xtreak
Recipients dchron, ezio.melotti, mrabarnett, xtreak
Date 2020-04-12.14:06:10
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1586700370.84.0.386015289897.issue40259@roundup.psfhosted.org>
In-reply-to
Content
Copy paste of the contents in the text file

In the re module there is an experimental feature called Scanner.
Some unexpected behavior was found while working with it.
Here is an example:

>>> re.Scanner([('\w+=(\d+);', lambda s,g: s.match.group(1))]).scan('x=5;')
(['5;'], '')

The obvious error is the semicolon returned via capturing group 1.

Adding a dummy rule at the beginning, seems to solve that issue:

>>> re.Scanner([('z', None), ('\w+=(\d+);', lambda s,g: s.match.group(1))]).scan('x=5;')
(['5'], '')

Adding a capturing group around \w+ also returns the correct answer:

>>> re.Scanner([('z', None), ('(\w+)=(\d+);', lambda s,g: s.match.group(1))]).scan('x=5;')
(['x'], '')

But then, if I ask for the second group, the problem appears again:

>>> re.Scanner([('z', None), ('(\w+)=(\d+);', lambda s,g: s.match.group(2))]).scan('x=5;')
(['5;'], '')
History
Date User Action Args
2020-04-12 14:06:10xtreaksetrecipients: + xtreak, ezio.melotti, mrabarnett, dchron
2020-04-12 14:06:10xtreaksetmessageid: <1586700370.84.0.386015289897.issue40259@roundup.psfhosted.org>
2020-04-12 14:06:10xtreaklinkissue40259 messages
2020-04-12 14:06:10xtreakcreate