classification
Title: Implement functools.lru_cache() using ordered dict
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.7
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: eric.snow, ezio.melotti, inada.naoki, rhettinger, serhiy.storchaka, vstinner
Priority: low Keywords: patch

Created on 2016-09-21 13:52 by serhiy.storchaka, last changed 2017-12-24 20:47 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
lru_cache-get-del-set.patch serhiy.storchaka, 2016-09-21 13:52 review
lru_cache-pop-set.patch serhiy.storchaka, 2016-09-21 13:52 review
lru_cache-move_to_end.patch serhiy.storchaka, 2016-09-21 13:52 review
lru_cache-get-del-set-2.patch serhiy.storchaka, 2017-12-24 20:37
lru_cache-pop-set-2.patch serhiy.storchaka, 2017-12-24 20:38
lru_cache-move_to_end-2.patch serhiy.storchaka, 2017-12-24 20:38
Messages (6)
msg277141 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-21 13:52
Since dict became ordered, this feature can be used in functools.lru_cache() implementation instead of linked list. This significantly simplifies the code.

The first implementation just uses _PyDict_GetItem_KnownHash() + _PyDict_DelItem_KnownHash() + _PyDict_SetItem_KnownHash() for moving existent key-value pair to the end of the dict.

The second implementation adds replaces the first two calls with one call of newly added _PyDict_PopItem_KnownHash() (this is just simplified copy of _PyDict_PopItem()).

The third implementation merges all three operations in one function _PyDict_MoveToEnd_KnownHash() that returns moved value if it exists, NULL without an exception set if there is no such key in a dict, and NULL with an exception set in case of error. Maybe this implementation is can be more optimized.

Benchmarking hits:

$ ./python -m perf timeit -s "from functools import lru_cache; f = lru_cache(512)(lambda x: x); a = list(range(500))*100" -- "list(map(f, a))"

Unpatched:    Median +- std dev: 13.5 ms +- 0.8 ms
get-del-set:  Median +- std dev: 21.7 ms +- 0.8 ms
pop-set:      Median +- std dev: 17.0 ms +- 0.6 ms
move_to_end:  Median +- std dev: 13.7 ms +- 0.5 ms

Benchmarking misses:

$ ./python -m perf timeit -s "from functools import lru_cache; f = lru_cache(512)(lambda x: x); a = list(range(50000))" -- "list(map(f, a))

Unpatched:    Median +- std dev: 31.5 ms +- 1.8 ms
get-del-set:  Median +- std dev: 58.6 ms +- 1.0 ms
pop-set:      Median +- std dev: 59.7 ms +- 1.1 ms
move_to_end:  Median +- std dev: 99.0 ms +- 0.5 ms

The get-del-set implementation is simple and don't need adding new functions to PyDict API, but it is 60-90% slower the baseline linked-list-based implementation. The pop-set implementation is slightly faster for hits. The move_to_end implementation is as fast as the baseline implementation for hits, but is much slower in case of misses (I think it can be tuned).
msg277152 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-21 15:00
The problem is that cache hits now create "holes" in the compact dict and trigger periodic compaction.  In contrast, the existing code leaves dicts unchanged when there is a cache hit.  

I prefer the current code. Though it took a little more effort to implement, that work is already done and done well.  The linked lists are cheap and have consistent, predictable performance.   I also like that the current implementation is only loosely coupled with dictionaries and not tightly tied to the compact dict with its evolving implementation details.
msg277169 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-21 16:32
I consider this issue as lying on the way to using the orderable of dict in OrderedDict implementation (the latter is very desirable because current OrderedDict implementation can be easily broken by using plain dict API). The main difficulty of implementing OrderedDict is the move_to_end() method (especially move_to_end(False)).

In additional, the problem of holes is occurred for any often modified dictionary. Using ordered dict in lru_cache() give as good stress test for optimizing dict updating and resizing code.

I don't suggest to change lru_cach() implementation just now. But after long testing ordered dicts during the developing stage of 3.7 (or even 3.8) we can make a decision.
msg277195 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-09-22 06:23
> I don't suggest to change lru_cach() implementation just now

For now, I would like to have this closed.  It doesn't make sense at the current juncture (with the compact being new, being in flux, and not having guaranteed ordering semantics).

Also, we should have a strong preference for loose coupling and high cohesion.  The lru cache code is principally about tracking recent and should maintain primary responsibility for the action, and the dictionary implementation should be primarily about a clean mapping implementation without having code baggage just for the lru).  

Besides that, there isn't much motivation for change.  The existing code is very good (clear, fast, decoupled, cohensive, and doesn't have compaction issues for cache hits).
msg277205 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2016-09-22 08:09
FYI: Here is interesting article. doubly-linked list is more inefficient than most people think.
http://alex.dzyoba.com/programming/dynamic-arrays.html
msg309017 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-24 20:47
Since this theme has been raised again in issue32422 and old patches no longer apply clearly I have updated patches and benchmark results on 64 bit.

$ ./python -m perf timeit -s "from functools import lru_cache; f = lru_cache(512)(lambda x: x); a = list(range(500))*100" -- "list(map(f, a))"

Unpatched:    Mean +- std dev: 3.03 ms +- 0.02 ms
get-del-set:  Mean +- std dev: 5.73 ms +- 0.04 ms
pop-set:      Mean +- std dev: 4.63 ms +- 0.04 ms
move_to_end:  Mean +- std dev: 3.57 ms +- 0.06 ms


$ ./python -m perf timeit -s "from functools import lru_cache; f = lru_cache(512)(lambda x: x); a = list(range(50000))" -- "list(map(f, a))"

Unpatched:    Mean +- std dev: 6.65 ms +- 0.11 ms
get-del-set:  Mean +- std dev: 15.5 ms +- 0.1 ms
pop-set:      Mean +- std dev: 15.4 ms +- 0.1 ms
move_to_end:  Mean +- std dev: 15.5 ms +- 0.2 ms


All patches cause significant slowdown (around 2.3x) in the case of many misses.
History
Date User Action Args
2017-12-24 20:47:45serhiy.storchakasetmessages: + msg309017
2017-12-24 20:38:41serhiy.storchakasetfiles: + lru_cache-move_to_end-2.patch
2017-12-24 20:38:17serhiy.storchakasetfiles: + lru_cache-pop-set-2.patch
2017-12-24 20:37:53serhiy.storchakasetfiles: + lru_cache-get-del-set-2.patch
2017-12-24 08:11:19serhiy.storchakaunlinkissue32422 superseder
2017-12-24 08:05:28serhiy.storchakalinkissue32422 superseder
2016-09-22 08:09:55inada.naokisetmessages: + msg277205
2016-09-22 06:34:35serhiy.storchakasetstatus: open -> closed
resolution: rejected
stage: resolved
2016-09-22 06:23:35rhettingersetmessages: + msg277195
2016-09-21 16:32:00serhiy.storchakasetmessages: + msg277169
2016-09-21 15:00:45rhettingersetmessages: + msg277152
2016-09-21 14:34:00rhettingersetassignee: rhettinger
2016-09-21 14:00:34ezio.melottisetnosy: + ezio.melotti
2016-09-21 13:52:43serhiy.storchakasetfiles: + lru_cache-move_to_end.patch
2016-09-21 13:52:28serhiy.storchakasetfiles: + lru_cache-pop-set.patch
2016-09-21 13:52:11serhiy.storchakacreate