classification
Title: Consider using lru_cache for the re.py caches
Type: performance Stage:
Components: Library (Lib) Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2016-09-17 20:06 by rhettinger, last changed 2016-09-19 03:18 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
re_repl_cache.diff rhettinger, 2016-09-17 21:58 Swap the repl_cache with an lru_cache review
Messages (8)
msg276832 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-17 20:06
The last time we applied the LRU cache to the re.py module, the overhead of the pure python version resulted in a net performance decrease.  But now we have a highly performance C version and should consider reinstating the code.
msg276838 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-17 20:22
Since that time the logic of re._compile() was changed. Now it can't just be wrapped with lru_cache().
msg276869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-18 04:06
Added comments on Rietveld.
msg276911 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-18 20:55
lru_cache can be used for re._compile() if add the ability to bypass the cache and to validate cached value.
msg276913 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-18 21:07
Yes, I saw that.  If a function could raise a NoCache exception,  re._compile() could take advantage of it.  But I don't feel good about going down that path (adding coupling between the caching decorator and the cached function).  It would be better to keep the lru_cache API simple.  I already made the mistake of expanding the API for typed=True just to accommodate a single use case (re.compile).
msg276918 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-18 21:27
Yes, raising an exception with a result as a payload is one option. Other option is to check a result. Something like:

def _compile_is_valid(value):
    p, loc = value
    return loc is None or loc == _locale.setlocale(_locale.LC_CTYPE)

def _compile_cache_if(value):
    p, loc = value
    return loc is not False

@lru_cache(_MAXCACHE, is_valid=_compile_is_valid, cache_if=_compile_cache_if)
def _compile1(pattern, flags):
    # internal: compile pattern
    if isinstance(pattern, _pattern_type):
        if flags:
            raise ValueError(
                "cannot process flags argument with a compiled pattern")
        return pattern, False
    if not sre_compile.isstring(pattern):
        raise TypeError("first argument must be string or compiled pattern")
    p = sre_compile.compile(pattern, flags)
    if flags & DEBUG:
        return p, False
    if not (p.flags & LOCALE):
        return p, None
    if not _locale:
        return p, False
    return p, _locale.setlocale(_locale.LC_CTYPE)

def _compile(pattern, flags):
    p, loc = _compile1(pattern, flags)
    return p
msg276928 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-19 00:40
I think I'll just take the low hanging fruit in _compile_repl and call it a day.
msg276932 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-19 03:17
New changeset 88110cfbf4dc by Raymond Hettinger in branch '3.6':
Issue #28193: Use lru_cache in the re module.
https://hg.python.org/cpython/rev/88110cfbf4dc
History
Date User Action Args
2016-09-19 03:18:38rhettingersetstatus: open -> closed
resolution: fixed
2016-09-19 03:17:59python-devsetnosy: + python-dev
messages: + msg276932
2016-09-19 00:40:02rhettingersetmessages: + msg276928
2016-09-18 21:27:24serhiy.storchakasetmessages: + msg276918
2016-09-18 21:07:48rhettingersetmessages: + msg276913
2016-09-18 20:55:58serhiy.storchakasetmessages: + msg276911
2016-09-18 04:06:58serhiy.storchakasetmessages: + msg276869
2016-09-17 21:58:33rhettingersetfiles: + re_repl_cache.diff
keywords: + patch
2016-09-17 20:22:37serhiy.storchakasetmessages: + msg276838
2016-09-17 20:06:04rhettingercreate