Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce lru_cache memory overhead. #76603

Closed
methane opened this issue Dec 24, 2017 · 13 comments
Closed

Reduce lru_cache memory overhead. #76603

methane opened this issue Dec 24, 2017 · 13 comments
Assignees
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@methane
Copy link
Member

methane commented Dec 24, 2017

BPO 32422
Nosy @rhettinger, @methane, @serhiy-storchaka
PRs
  • bpo-32422: Reduce lru_cache memory usage. #5008
  • Files
  • lru_bench.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2017-12-25.17:03:57.561>
    created_at = <Date 2017-12-24.05:19:02.014>
    labels = ['interpreter-core', '3.7', 'performance']
    title = 'Reduce lru_cache memory overhead.'
    updated_at = <Date 2017-12-25.17:03:57.561>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2017-12-25.17:03:57.561>
    actor = 'methane'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2017-12-25.17:03:57.561>
    closer = 'methane'
    components = ['Interpreter Core']
    creation = <Date 2017-12-24.05:19:02.014>
    creator = 'methane'
    dependencies = []
    files = ['47348']
    hgrepos = []
    issue_num = 32422
    keywords = ['patch']
    message_count = 13.0
    messages = ['308980', '308981', '308985', '308986', '309003', '309006', '309007', '309013', '309016', '309029', '309031', '309032', '309039']
    nosy_count = 3.0
    nosy_names = ['rhettinger', 'methane', 'serhiy.storchaka']
    pr_nums = ['5008']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue32422'
    versions = ['Python 3.7']

    @methane
    Copy link
    Member Author

    methane commented Dec 24, 2017

    Currently, functools.lru_cache implement own doubly-linked list.

    But it is inefficient than OrderedDict because each link node is GC object.
    So it may eat more memory and take longer GC time.

    I added two private C API for OrderedDict and make lru_cache use it.

    • _PyODict_LRUGetItem(): Similar to PyDict_GetItemWithHash() + od.move_to_end(last=True).
    • _PyODict_PopItem(): Similar to odict.popitem().

    Why I didn't implement C version of move_to_end() is to reduce lookup.
    _PyODict_LRUGetItem(key) lookup key once while
    od[key]; od.move_to_end(key) lookup key twice.

    I'll benchmark it and report result here.

    @methane methane added 3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Dec 24, 2017
    @methane
    Copy link
    Member Author

    methane commented Dec 24, 2017

    Current implementation (no news entry yet):
    https://github.com/methane/cpython/pull/10/files

    @serhiy-storchaka
    Copy link
    Member

    This is a duplicate of bpo-28239.

    @serhiy-storchaka
    Copy link
    Member

    Ah, sorry, you use OrderedDict instead of just ordered dict. It should have different timing and memory consumption.

    @methane
    Copy link
    Member Author

    methane commented Dec 24, 2017

    Hmm, it seems my implementation is 30% slower when many mishit scenario.
    Maybe, dict is faster than OrderedDict about massive insert/discard. But I need to profile it.

    On the other hand, GC speed looks about 2x faster as expected.

    $ ./python -m perf compare_to master.json patched.json  -G
    Slower (5):
    - lru_1000_100: 217 ns +- 6 ns -> 302 ns +- 6 ns: 1.39x slower (+39%)
    - lru_10000_1000: 225 ns +- 4 ns -> 309 ns +- 2 ns: 1.37x slower (+37%)
    - lru_100_1000: 114 ns +- 5 ns -> 119 ns +- 1 ns: 1.05x slower (+5%)
    - lru_100_100: 115 ns +- 6 ns -> 119 ns +- 1 ns: 1.03x slower (+3%)
    - lru_1000_1000: 134 ns +- 6 ns -> 136 ns +- 1 ns: 1.02x slower (+2%)

    Faster (4):

    • gc(1000000): 98.3 ms +- 0.3 ms -> 37.9 ms +- 0.2 ms: 2.59x faster (-61%)
    • gc(100000): 11.7 ms +- 0.0 ms -> 5.10 ms +- 0.02 ms: 2.29x faster (-56%)
    • gc(10000): 1.48 ms +- 0.02 ms -> 1.04 ms +- 0.01 ms: 1.41x faster (-29%)
    • lru_10_100: 149 ns +- 6 ns -> 147 ns +- 2 ns: 1.02x faster (-2%)

    @methane
    Copy link
    Member Author

    methane commented Dec 24, 2017

    I found odict.pop() and odict.popitem() is very inefficient because
    it look up key multiple times.
    odict seems not optimized well and very slow than dict in some area...

    I'll try to optimize it in holidays.

    @rhettinger
    Copy link
    Contributor

    Please stop revising every single thing you look at. The traditional design of LRU caches used doubly linked lists for a reason. In particular, when there is a high hit rate, the links can be updated without churning the underlying dictionary.

    @rhettinger
    Copy link
    Contributor

    FWIW, I'm the original author and designer of this code, so it would have been appropriate to assign this to me for sign-off on any proposed changes.

    @rhettinger rhettinger self-assigned this Dec 24, 2017
    @serhiy-storchaka
    Copy link
    Member

    FWIW, I'm the original author and designer of this code, so it would have
    been appropriate to assign this to me for sign-off on any proposed changes.

    Not me (the C implementation)? ;-)

    @methane
    Copy link
    Member Author

    methane commented Dec 25, 2017

    Please stop revising every single thing you look at. The traditional design of LRU caches used doubly linked lists for a reason. In particular, when there is a high hit rate, the links can be updated without churning the underlying dictionary.

    I don't proposing removing doubly linked list; OrderedDict uses
    doubly-linked list too, and I found no problem for most-hit scenario.

    On the other hand, I found problem of OrderedDict for most mis hit
    scenario.

    Now I think lru_cache's implementation is better OrderedDict.
    PyODict is slower than lru_cache's dict + linked list because of
    historical reason (compatibility with pure Python implemantation.)

    So I stop trying to remove lru_cache's own implementation.
    I'll try to reduce overhead of lru_cache, by removing GC header
    from link node.

    @methane methane changed the title Make OrderedDict can be used for LRU from C Reduce lru_cache memory overhead. Dec 25, 2017
    @methane
    Copy link
    Member Author

    methane commented Dec 25, 2017

    PR-5008 benchmark:

    $ ./python -m perf compare_to master.json patched2.json -G
    Faster (9):
    - gc(1000000): 98.3 ms +- 0.3 ms -> 29.9 ms +- 0.4 ms: 3.29x faster (-70%)
    - gc(100000): 11.7 ms +- 0.0 ms -> 3.71 ms +- 0.03 ms: 3.14x faster (-68%)
    - gc(10000): 1.48 ms +- 0.02 ms -> 940 us +- 6 us: 1.57x faster (-36%)
    - lru_10_100: 149 ns +- 6 ns -> 138 ns +- 1 ns: 1.08x faster (-8%)
    - lru_100_100: 115 ns +- 6 ns -> 108 ns +- 1 ns: 1.07x faster (-6%)
    - lru_1000_1000: 134 ns +- 6 ns -> 127 ns +- 1 ns: 1.05x faster (-5%)
    - lru_100_1000: 114 ns +- 5 ns -> 108 ns +- 1 ns: 1.05x faster (-5%)
    - lru_1000_100: 217 ns +- 6 ns -> 212 ns +- 4 ns: 1.03x faster (-2%)
    - lru_10000_1000: 225 ns +- 4 ns -> 221 ns +- 5 ns: 1.02x faster (-2%)

    @serhiy-storchaka
    Copy link
    Member

    LGTM.

    @serhiy-storchaka serhiy-storchaka added the performance Performance or resource usage label Dec 25, 2017
    @methane
    Copy link
    Member Author

    methane commented Dec 25, 2017

    New changeset 3070b71 by INADA Naoki in branch 'master':
    bpo-32422: Reduce lru_cache memory usage (GH-5008)
    3070b71

    @methane methane closed this as completed Dec 25, 2017
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants