Message 403851 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jonash
Recipients	ezio.melotti, jonash, mrabarnett
Date	2021-10-13.16:38:53
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1634143133.69.0.265087114017.issue45462@roundup.psfhosted.org>
In-reply-to

Content
re.match(p, ...) with a pre-compiled pattern p = re.compile(...) can be much slower than calling p.match(...). Probably mostly in cases with "easy" patterns and/or short strings. The culprit is that re.match -> re._compile can spend a lot of time looking up p its internal _cache, where it will never find p: def _compile(pattern, flags): ... try: return _cache[type(pattern), pattern, flags] except KeyError: pass if isinstance(pattern, Pattern): ... return pattern ... _cache[type(pattern), pattern, flags] = p ... _compile will always return before the _cache is set if given a Pattern object. By simply reordering the isinstance(..., Pattern) check we can safe a lot of time. I've seen speedups in the range of 2x-5x on some of my data. As an example: Raw speed of re.compile(p, ...).match(): time ./python.exe -c 'import re'\n'pat = re.compile(".").match'\n'for _ in range(1_000_000): pat("asdf")' Executed in 190.59 millis Speed with this optimization: time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")' Executed in 291.39 millis Speed without this optimization: time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")' Executed in 554.42 millis

re.match(p, ...) with a pre-compiled pattern p = re.compile(...) can be much slower than calling p.match(...). Probably mostly in cases with "easy" patterns and/or short strings.

The culprit is that re.match -> re._compile can spend a lot of time looking up p its internal _cache, where it will never find p:

def _compile(pattern, flags):
    ...
    try:
        return _cache[type(pattern), pattern, flags]
    except KeyError:
        pass
    if isinstance(pattern, Pattern):
        ...
        return pattern
    ...
        _cache[type(pattern), pattern, flags] = p
    ...

_compile will always return before the _cache is set if given a Pattern object.

By simply reordering the isinstance(..., Pattern) check we can safe a lot of time.

I've seen speedups in the range of 2x-5x on some of my data. As an example:

Raw speed of re.compile(p, ...).match():
    time ./python.exe -c 'import re'\n'pat = re.compile(".").match'\n'for _ in range(1_000_000): pat("asdf")'
    Executed in  190.59 millis

Speed with this optimization:
    time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")'
    Executed in  291.39 millis

Speed without this optimization:
    time ./python.exe -c 'import re'\n'pat = re.compile(".")'\n'for _ in range(1_000_000): re.match(pat, "asdf")'
    Executed in  554.42 millis

History
Date	User	Action	Args
2021-10-13 16:38:53	jonash	set	recipients: + jonash, ezio.melotti, mrabarnett
2021-10-13 16:38:53	jonash	set	messageid: <1634143133.69.0.265087114017.issue45462@roundup.psfhosted.org>
2021-10-13 16:38:53	jonash	link	issue45462 messages
2021-10-13 16:38:53	jonash	create