Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using lru_cache for the re.py caches #72380

Closed
rhettinger opened this issue Sep 17, 2016 · 8 comments
Closed

Consider using lru_cache for the re.py caches #72380

rhettinger opened this issue Sep 17, 2016 · 8 comments
Assignees
Labels
3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@rhettinger
Copy link
Contributor

BPO 28193
Nosy @rhettinger, @serhiy-storchaka
Files
  • re_repl_cache.diff: Swap the repl_cache with an lru_cache
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-09-19.03:18:38.645>
    created_at = <Date 2016-09-17.20:06:04.764>
    labels = ['3.7', 'library', 'performance']
    title = 'Consider using lru_cache for the re.py caches'
    updated_at = <Date 2016-09-19.03:18:38.644>
    user = 'https://github.com/rhettinger'

    bugs.python.org fields:

    activity = <Date 2016-09-19.03:18:38.644>
    actor = 'rhettinger'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-09-19.03:18:38.645>
    closer = 'rhettinger'
    components = ['Library (Lib)']
    creation = <Date 2016-09-17.20:06:04.764>
    creator = 'rhettinger'
    dependencies = []
    files = ['44731']
    hgrepos = []
    issue_num = 28193
    keywords = ['patch']
    message_count = 8.0
    messages = ['276832', '276838', '276869', '276911', '276913', '276918', '276928', '276932']
    nosy_count = 3.0
    nosy_names = ['rhettinger', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue28193'
    versions = ['Python 3.6', 'Python 3.7']

    @rhettinger
    Copy link
    Contributor Author

    The last time we applied the LRU cache to the re.py module, the overhead of the pure python version resulted in a net performance decrease. But now we have a highly performance C version and should consider reinstating the code.

    @rhettinger rhettinger added the 3.7 (EOL) end of life label Sep 17, 2016
    @rhettinger rhettinger added stdlib Python modules in the Lib dir performance Performance or resource usage labels Sep 17, 2016
    @serhiy-storchaka
    Copy link
    Member

    Since that time the logic of re._compile() was changed. Now it can't just be wrapped with lru_cache().

    @serhiy-storchaka
    Copy link
    Member

    Added comments on Rietveld.

    @serhiy-storchaka
    Copy link
    Member

    lru_cache can be used for re._compile() if add the ability to bypass the cache and to validate cached value.

    @rhettinger
    Copy link
    Contributor Author

    Yes, I saw that. If a function could raise a NoCache exception, re._compile() could take advantage of it. But I don't feel good about going down that path (adding coupling between the caching decorator and the cached function). It would be better to keep the lru_cache API simple. I already made the mistake of expanding the API for typed=True just to accommodate a single use case (re.compile).

    @serhiy-storchaka
    Copy link
    Member

    Yes, raising an exception with a result as a payload is one option. Other option is to check a result. Something like:

    def _compile_is_valid(value):
        p, loc = value
        return loc is None or loc == _locale.setlocale(_locale.LC_CTYPE)
    
    def _compile_cache_if(value):
        p, loc = value
        return loc is not False
    
    @lru_cache(_MAXCACHE, is_valid=_compile_is_valid, cache_if=_compile_cache_if)
    def _compile1(pattern, flags):
        # internal: compile pattern
        if isinstance(pattern, _pattern_type):
            if flags:
                raise ValueError(
                    "cannot process flags argument with a compiled pattern")
            return pattern, False
        if not sre_compile.isstring(pattern):
            raise TypeError("first argument must be string or compiled pattern")
        p = sre_compile.compile(pattern, flags)
        if flags & DEBUG:
            return p, False
        if not (p.flags & LOCALE):
            return p, None
        if not _locale:
            return p, False
        return p, _locale.setlocale(_locale.LC_CTYPE)
    
    def _compile(pattern, flags):
        p, loc = _compile1(pattern, flags)
        return p

    @rhettinger
    Copy link
    Contributor Author

    I think I'll just take the low hanging fruit in _compile_repl and call it a day.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 19, 2016

    New changeset 88110cfbf4dc by Raymond Hettinger in branch '3.6':
    Issue bpo-28193: Use lru_cache in the re module.
    https://hg.python.org/cpython/rev/88110cfbf4dc

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants