Title: wrongly cache pattern by re.compile
msg381907 - (view) Author: ProFatXuanAll (ProFatXuanAll) Date: 2020-11-26 18:29
When I run the following code, I expected to get output result `['i am next line with [unk]']`, but instead I get the original list in `data`.

Code snippet

import re

data = [
    '= hello =',
    'i am next line with <unk>',

pttn = re.compile(r'=.*=')
samples = filter(lambda sample: not pttn.match(sample), data)

pttn = re.compile(r'<unk>')
samples = map(lambda sample: pttn.sub('[unk]', sample), samples)


I suspect that is the cache provide by `re.compile` cause the problem.
The `sub` function in `map` is somehow begin link to the first `pttn`.

If I instead rename the second `pttn` to `pttn2`, then it work magically, but this is not expected.
msg381911 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2020-11-26 19:45
That behaviour has nothing to do with re.

This line:

    samples = filter(lambda sample: not pttn.match(sample), data)

creates a generator that, when evaluated, will use the value of 'pttn' _at that time_.

However, you then bind 'pttn' to something else.

Here's a simple example:

>>> x = 1
>>> func = lambda: print(x)
>>> func()
>>> x = 2
>>> func()

A workaround is to capture the current value with a default argument:

>>> x = 1
>>> func = lambda x=x: print(x)
>>> func()
>>> x = 2
>>> func()
