Message403851
re.match(p, ...) with a pre-compiled pattern p = re.compile(...) can be much slower than calling p.match(...). Probably mostly in cases with "easy" patterns and/or short strings.
The culprit is that re.match -> re._compile can spend a lot of time looking up p its internal _cache, where it will never find p:
def _compile(pattern, flags):
...
try:
return _cache[type(pattern), pattern, flags]
except KeyError:
pass
if isinstance(pattern, Pattern):
...
return pattern
...
_cache[type(pattern), pattern, flags] = p
...
_compile will always return before the _cache is set if given a Pattern object.
By simply reordering the isinstance(..., Pattern) check we can safe a lot of time.
I've seen speedups in the range of 2x-5x on some of my data. As an example:
Raw speed of re.compile(p, ...).match():
time ./python.exe -c 'import re'\n'pat = re.compile(".").match'\n'for _ in range(1_000_000): pat("asdf")'
Executed in 190.59 millis
Speed with this optimization:
time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")'
Executed in 291.39 millis
Speed without this optimization:
time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")'
Executed in 554.42 millis |
|
Date |
User |
Action |
Args |
2021-10-13 16:38:53 | jonash | set | recipients:
+ jonash, ezio.melotti, mrabarnett |
2021-10-13 16:38:53 | jonash | set | messageid: <1634143133.69.0.265087114017.issue45462@roundup.psfhosted.org> |
2021-10-13 16:38:53 | jonash | link | issue45462 messages |
2021-10-13 16:38:53 | jonash | create | |
|