classification
Title: C implementation of functools.lru_cache
Type: performance Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: 24276 Superseder:
Assigned To: serhiy.storchaka Nosy List: Aaron.Meurer, BreamoreBoy, amaury.forgeotdarc, asvetlov, brechtm, eric.snow, ezio.melotti, giampaolo.rodola, jason.coombs, jcea, josh.r, kachayev, kmike, meador.inge, ned.deily, pitrou, poke, python-dev, rhettinger, scoder, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2012-03-20 13:40 by anacrolix, last changed 2015-11-01 08:32 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
lru_cache_class.py rhettinger, 2012-03-31 20:10 Updated pure python roadmap for the C implementation. Adds handling for kwd_mark and try/except
functools.lru_cache-in-c.patch anacrolix, 2012-04-02 07:20 review
14373.fixed.diff kachayev, 2012-12-19 21:54 Python version works without C acceleration, test cases for both implementations review
14373.v3.diff kachayev, 2012-12-21 19:38 review
14373.v4.diff kachayev, 2012-12-21 22:13 review
lru_cache_bench.py serhiy.storchaka, 2012-12-22 20:21 Benchmark script
14373.v7.diff kachayev, 2012-12-30 13:43 review
14373.v9.diff kachayev, 2012-12-30 20:04 review
clru_cache.patch serhiy.storchaka, 2013-11-22 17:49 review
clru_cache2.patch serhiy.storchaka, 2013-11-22 21:48 review
clru_cache_3.patch serhiy.storchaka, 2015-05-24 07:13 review
clru_cache_crasher.py serhiy.storchaka, 2015-05-24 07:13
clru_cache_4.patch serhiy.storchaka, 2015-05-24 13:18 Removed fast path in make_key review
clru_cache_descr_get.patch serhiy.storchaka, 2015-06-07 06:44 review
test_lru_cache_threaded.patch serhiy.storchaka, 2015-06-07 18:56 review
clru_cache_new.patch serhiy.storchaka, 2015-07-14 06:23 review
Messages (66)
msg156405 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-20 13:40
functools.lru_cache is optimized to the point that it may benefit from a C implementation.
msg156449 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-20 19:14
Thank you for working on this.  I look forward to reviewing it.
msg156492 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-21 12:10
Updated patch to fix a crash if maxsize isn't given, and add a unit test for that.

Possible issues:

 * I've tried to emulate object() by calling PyBaseObject_Type. Not sure if there's a more lightweight object for this that just provides the needed __hash__ and __eq__ attributes.
 * Not sure if kwd_mark is deallocated correctly.
 * Can't quite work out the best way to wrap the C cache_info() method to output the _CacheInfo named tuple. The current mechanism might not be wrapping the attributes correctly.
msg156803 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-26 08:29
I've just started looking at this.  Nice job and good attention to detail on the error checking.  Expect to have a few high-level suggestions and a ton of minor edits.

Here are a couple of quick thoughts:
* The comment style needs to switch to the /* style */
* Use the pure python _cache_info by looking it up on the module object.
* I had expected PyList objects rather than links between C structs (following the pure python lru_cache as closely as possible).
msg156808 - (view) Author: Matt Joiner (anacrolix) Date: 2012-03-26 10:53
I've fixed the commenting, and cache_info use.

I've left the element management in pure C as it reduces memory use (56 bytes for 4 element list, vs. 16 for lru_cache_elem), and avoids ref counting overhead (3 refs per link, plus GC). The difference might become quite marked for very large caches. There's also a nice invariant that links the key to the cache dict, and the result object to the lru_cache_elem. I'm happy to change this if it doesn't matter.

My only concern now is the wrapping of the lru cache object. In the Python version, @wraps allows the lru_cache to masquerade as the wrapped function wrt str/repr. The C version is wrapped, but str/repr remain unchanged. Not sure if this is a problem.
msg157224 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-31 19:50
Matt, I'm attaching a pure python version to serve as a better map for how to implement this in C.

* it incorporate the recent lru_cache algorithmic updates (moving the root around the circular queue to re-use old links).

* it shows which parts should be implemented in C using a regular type and shows how to call it from pure python.

* it gives hints on use of #defines and PyDict_GetItem

* the critical sections are marked so you can use the GIL rather than using locks.

* there are hints for what datatypes to use (only the hits and misses need the ability to grow beyond sys.maxsize).

* it shows how to access the named tuple from within the C code (using a straight PyObject_Call).

* this code passes the test suite and should be directly translatable (and very fast).

* please follow the model and use PyList objects instead of C structure for links

* let me know if there are any questions.
msg157226 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-31 20:10
Updated to show how to handle the kwd_mark and the try/except.
msg157343 - (view) Author: Matt Joiner (anacrolix) Date: 2012-04-02 07:20
> * it incorporate the recent lru_cache algorithmic updates (moving the root around the circular queue to re-use old links).

The existing C patch already does this.

> * it shows which parts should be implemented in C using a regular type and shows how to call it from pure python.
 
The existing patch already did this.

> * it gives hints on use of #defines and PyDict_GetItem

I'm familiar with C. I've retained PyDict_GetItemWithError so as not to suppress typing errors from keys constructed from bad arguments from the user.

> * the critical sections are marked so you can use the GIL rather than using locks.

The existing patch is already using the GIL, it didn't have any locks.

> * there are hints for what datatypes to use (only the hits and misses need the ability to grow beyond sys.maxsize).

The existing patch already had this.

> * it shows how to access the named tuple from within the C code (using a straight PyObject_Call).

I incorporated this.

> * this code passes the test suite and should be directly translatable (and very fast).

So did the old code.

> * please follow the model and use PyList objects instead of C structure for links

I've left this as is, the linked list in C is idiomatic, avoids a lot of branching and reference counting and is also more readable than the equivalent C/Python.
msg175533 - (view) Author: Matt Joiner (anacrolix) Date: 2012-11-14 04:16
I look forward to your feedback Ezio.
msg175540 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-14 07:06
I wonder if we should keep the original Python implementation alongside the new C version.  If we do, it would be nice to see a 100% coverage of the Python version and have the tests check both the implementations.
msg175545 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-11-14 07:20
See also #12428.
msg176287 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-11-24 14:19
I have added a lot of comments in Rietveld. In general the patch looks good, but I have some style nitpicks and some more strong comments. Matt, please update the patch.

Tests should test both Python and C implementations.

I doubt if get rid of locking is right. Hash calculation and comparison of arguments can call Python code and this code can recursive use the wrapped function.

Some benchmarks will be interesting.

I think this acceleration must necessarily be in 3.4.
msg177784 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-19 21:54
Updated patch with next points:

* fix typedefs
* python implementation works normally without C acceleration
* test cases cover both python/c implementations
* several style fixes in C module
msg177808 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-20 10:04
Thanks, Alexey.

However most of my comments were not considered. I have repeated them and added some new comments.
msg177867 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-21 08:05
Serhiy, thank you for review. Working further on fixes.
msg177889 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-21 19:38
Fixed my previous patch according to all comments from review, except removing keyword arguments from lru_cache_new function.
msg177903 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-21 22:13
Added additional Py_DECREF(s) for key and value.
msg177953 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-22 20:21
LGTM. Thank you, Matt and Alexey.

Here is a simple benchmark. On my computer it shows 10-25x speedup using the C accelerator.
msg177955 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-22 21:00
Antoine reminded me about a lock. In Python implementation it needed because linked list modifications are not atomic. In C implementation linked list modifications are atomic. However dict operations can call Python code and therefore they are not atomic. I don't know what bad things can happened with concurrent cache updating, however using lock will be safer and cheap enought.

Please add lock field to lru_cache_object and use it as in Python implementation. If no one can prove that a lock is not needed.
msg177967 - (view) Author: Stefan Behnel (scoder) * Date: 2012-12-23 10:08
Just for the record, I've compiled Raymond's roadmap version in Cython (with only slight changes to make 'self.maxsize' a Py_ssize_t and using an external .pxd for typing) and ran Serhiy's benchmark over it (Ubuntu 12.10, 64bit). This is what I get in Py3.4:

 0.022  untyped_cy(i)
 0.023  untyped_cy("spam", i)
 0.024  untyped_cy("spam", "spam", i)
 0.106  untyped_cy(a=i)
 0.133  untyped_cy(a="spam", b=i)
 0.152  untyped_cy(a="spam", b="spam", c=i)
 0.033  typed_cy(i)
 0.038  typed_cy("spam", i)
 0.039  typed_cy("spam", "spam", i)
 0.129  typed_cy(a=i)
 0.168  typed_cy(a="spam", b=i)
 0.183  typed_cy(a="spam", b="spam", c=i)

 0.143  untyped_py(i)
 0.234  untyped_py("spam", i)
 0.247  untyped_py("spam", "spam", i)
 0.368  untyped_py(a=i)
 0.406  untyped_py(a="spam", b=i)
 0.425  untyped_py(a="spam", b="spam", c=i)
 0.447  typed_py(i)
 0.469  typed_py("spam", i)
 0.480  typed_py("spam", "spam", i)
 0.745  typed_py(a=i)
 0.783  typed_py(a="spam", b=i)
 0.819  typed_py(a="spam", b="spam", c=i)

Looking at the factors, that's about the same speedup that the dedicated hand tuned C implementation presented according to Serhiy's own runs (he reported 10-25x). Makes me wonder why we should have two entirely separate implementations for this.

Here's the lru_cache_class.pxd file I used:

"""
cimport cython

cdef make_key(tuple args, dict kwds, bint typed, tuple kwd_mark)

@cython.final
@cython.internal
cdef class c_lru_cache:
    cdef dict cache
    cdef Py_ssize_t hits
    cdef Py_ssize_t misses
    cdef Py_ssize_t maxsize
    cdef bint typed
    cdef object user_function
    cdef object cache_info_type
    cdef tuple kwd_mark
    cdef list root
"""
msg177976 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-12-23 12:10
Hmm. Judging by the numbers for the Python version, my machine appears
to be slower than Stefan (Behnel)'s machine, and yet the C version is
much faster here than the posted Cython numbers.

If I adjust the results for the machine differences, the C version
would appear to be 2.5x faster than the Cython version.
msg177980 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-12-23 12:40
I've managed to build the Cython version now. It's in fact between 4 and 6
times slower here than the C version.
msg177993 - (view) Author: Stefan Behnel (scoder) * Date: 2012-12-23 16:20
Yep, I basically didn't do any optimisation, it's the plain Python code compiled, just with the class being converted into an extension type.
msg178573 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-30 13:43
Thread-safe implementation for cache cleanup.
msg178589 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-12-30 17:16
Alexey, as I see, you have missed some Antoine's comments (and my comments about whitespaces). Please, be more careful.
msg178611 - (view) Author: Alexey Kachayev (kachayev) * Date: 2012-12-30 20:04
Updated diff with:
 * fix object leaks
 * tp_clear
 * additional test for maxsize < 0

I also reimplemented multithreading test (fixed error and added skip rule). But actually, I'm not sure that it's enough for ensuring thread-safety of clear operation. I'm working on other variant now. I will be appreciated for any advice about where to find example of the same (or close enough) test cases from other modules.

Regarding to previous comments from review. 

1. "guard against negative numbers"

I added special test case for negative maxsize in order to show, that now C version works the same as Python one. So, should we change both implementation? What behavior is most logical here? Reimplementation will change public API, so it's not only about acceleration.

2. Use regular PyObject instead of lru_list_elem.

What the problems are with current implementation?
msg203597 - (view) Author: Brecht Machiels (brechtm) Date: 2013-11-21 09:34
What's the status of this patch? What still needs to be done for it to be accepted?
msg203599 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-21 09:45
I'm working on this.
msg203654 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-11-21 16:43
After Serhiy looks at this, I need to review it for reentrancy thread-safety and reentrancy issues.
msg203821 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-22 17:49
Here is a revised patch. It is synchronized with tip. Got rid of capsules (sped up about 10%). lru_list_elem is now true Python object. Got rid of the lock, the new code must be thread-safe with GIL only.

I very much hope that this code will be included in 3.4. We can fix bugs later if there are any. I am also planning to optimize the make_key() functions.
msg203846 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-22 19:42
> Got rid of the lock, the new code must be thread-safe with GIL only.

I'm not convinced by this (see the review I posted).
msg203894 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-11-22 21:48
Updated patch fixes bugs found by Antoine. Thank you Antoine.
msg214208 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-03-20 11:51
Could you please make a review Raymond?
msg214209 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-03-20 11:57
> Could you please make a review Raymond?

Yes, I will take a look.   I looking a making other changes to the lru_cache and don't want the C implementation to go it first.  There are still some open questions about re-entrancy that still need to be addressed before a C implementation gets added to the mix.
msg223439 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-18 21:42
https://pypi.python.org/pypi/fastcache/0.4.0 also seems relevant.
msg224938 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-06 15:22
Raymond, do you have time for the review? It will be easier to support both implementation at the same time, than change only Python implementation and asynchronously update the patch with the risk to lost changes in C implementation.

I think we should increase the priority of this issue. Efficient lru_cache will affect other parts of the stdlib. In particular the use of lru_cache was withdrawed in the re module due to large overhead of Python implementation. The ipaddress module now uses own specialized implementation of the caching instead of general lru_cache for the same reason. Issue13299 proposition will be more acceptable with faster lru_cache.
msg224941 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-06 15:34
I think a lock is still needed for cache misses. The dict operations (set and del) can release the GIL (as well as course as the PyObject_Call()), therefore you might end up with duplicate list links for a given key.

(and given cache misses are supposed to be much more expensive anyway, I don't think acquiring a lock there is detrimental)
msg224980 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-08-07 01:19
> I think we should increase the priority of this issue.

I don't think so at all.  The LRU cache we have now is plenty efficient for its intended use cases (caching I/O bound functions and expensive functions).  If is only unsuitable for functions that are already blazingly fast.  

Getting the locks right and carefully looking for re-entrancy issues is important.  Also, keeping the memory footprint of the keys small is important (if people didn't care about space, they wouldn't be using an LRU at all).

I will look at this but currently have much higher priorities elsewhere in Python (adding C accelerators for tricky code is less important for the time being -- we have a long time until 3.5).
msg224981 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-08-07 01:51
Le 06/08/2014 21:19, Raymond Hettinger a écrit :
>
> I don't think so at all. The LRU cache we have now is plenty
> efficient
for its intended use cases (caching I/O bound functions and expensive
functions). If is only unsuitable for functions that are already
blazingly fast.

This is an unrealistic simplification. Many functions can be either 
expensive or blazingly fast, depending on their input (typical examples 
are re.compile(), math.factorial()...). But the decision of applying the 
lru_cache decorator is a compile-time binary decision: it cannot encode 
the varying properties of the function depending on its inputs.
msg243932 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-05-23 17:15
Serhiy, go ahead and apply your patch, lru_cache2.patch.  I'll fix-up the make_key() code after the beta1 (like the pure python version, it needs to guarantee that hash is called no more than once per key).
msg243939 - (view) Author: Roundup Robot (python-dev) Date: 2015-05-23 19:43
New changeset 57776eee74f2 by Serhiy Storchaka in branch 'default':
Issue #14373: Added C implementation of functools.lru_cache().  Based on
https://hg.python.org/cpython/rev/57776eee74f2
msg243940 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-23 19:48
Thanks Raymond.

I tried to write an implementation with locking, but it would be too complicated (much more complex than all proposed before patches), because recursive locks are needed. In any case I think that GIL is enough here. Locking in Python implementation is needed for atomic modifying linked list.
msg243962 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-05-24 02:41
Thanks Serhiy.  I'll work on the next steps after the beta.
msg243967 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-24 04:21
Unfortunately the patch caused a crash in test_ipaddress. Larry granted special exclusion for adding this feature after feature freeze if it will be fixed before beta 2.
msg243970 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-24 07:13
Here are the recent version of the patch and minimal crash reproducer.
msg243982 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-24 13:18
The problem is in property descriptor getter. It uses cached tuple for args and can unexpectedly modify it. Opened issue24276 for fixing this bug.

Another way is to remove fast path in lru_cache_make_key() and always copy args tuple.
msg244002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-24 20:33
Backed out the backout in cb30db9cc029.
msg244928 - (view) Author: Tim Graham (Tim.Graham) * Date: 2015-06-06 20:30
This changed caused a problem in Django's test suite (bisected to 0d0989359bbb0).

  File "/home/tim/code/django/django/db/models/options.py", line 709, in _populate_directed_relation_graph
    all_models = self.apps.get_models(include_auto_created=True)
TypeError: get_models() missing 1 required positional argument: 'self'

The get_models() method is decorated with @lru_cache.lru_cache(maxsize=None)

https://github.com/django/django/blob/c2b4967e76fd671e6199e4dd54d2a2c1f096b8eb/django/apps/registry.py#L157-L179
msg244944 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-07 06:44
Here is a patch that adds __get__ for lru_cache object.
msg244950 - (view) Author: Tim Graham (Tim.Graham) * Date: 2015-06-07 10:43
Thanks, that does resolve the issue.
msg244966 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-07 18:56
test_lru_cache_threaded is randomly failed on PPC64 PowerLinux buildbot: http://buildbot.python.org/all/builders/PPC64%20PowerLinux%203.x/builds/3573/steps/test/logs/stdio

======================================================================
FAIL: test_lru_cache_threaded (test.test_functools.TestLRUPy)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shager/cpython-buildarea/3.x.edelsohn-powerlinux-ppc64/build/Lib/test/test_functools.py", line 1129, in test_lru_cache_threaded
    self.assertEqual(hits, 45)
AssertionError: 43 != 45

----------------------------------------------------------------------

The tests looks incorrect, it should fail when cached function is called simultaneously from different threads. Unfortunately it is hard reproduce the failure on other platform.

Proposed patch fixes the test and adds other threaded test, specially for simultaneous call.
msg244992 - (view) Author: Roundup Robot (python-dev) Date: 2015-06-08 08:21
New changeset 19dbee688a30 by Serhiy Storchaka in branch '3.5':
Issue #14373: Fixed threaded test for lru_cache(). Added new threaded test.
https://hg.python.org/cpython/rev/19dbee688a30

New changeset da331f50aad4 by Serhiy Storchaka in branch 'default':
Issue #14373: Fixed threaded test for lru_cache(). Added new threaded test.
https://hg.python.org/cpython/rev/da331f50aad4

New changeset eff50d543c79 by Serhiy Storchaka in branch '3.5':
Issue #14373: C implementation of functools.lru_cache() now can be used with
https://hg.python.org/cpython/rev/eff50d543c79

New changeset f4b785d14792 by Serhiy Storchaka in branch 'default':
Issue #14373: C implementation of functools.lru_cache() now can be used with
https://hg.python.org/cpython/rev/f4b785d14792
msg244994 - (view) Author: Roundup Robot (python-dev) Date: 2015-06-08 09:45
New changeset c52f381fe674 by Serhiy Storchaka in branch '3.5':
Issue #14373: Other attempt to fix threaded test for lru_cache().
https://hg.python.org/cpython/rev/c52f381fe674

New changeset 73acb8c187d4 by Serhiy Storchaka in branch 'default':
Issue #14373: Other attempt to fix threaded test for lru_cache().
https://hg.python.org/cpython/rev/73acb8c187d4
msg245606 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2015-06-21 17:37
If the C version is to remain in Python3.5, please make sure it provides all of the carefully designed features of the pure python version:

   * The hash function is called no more than once per element
   * The key is constructed to be flat as possible
   * Thread-safe where needed:
     - make_key called outside locks 
     - within a reentrant lock (either a real lock or the GIL),
       check to see if the key is in the cache and if so, 
       update the stats and return the value
     + outside of locks, call the user function
       (allowing for recursive functions)
     + within a reentrant lock (either a real lock or the GIL)
       handle cache misses by 1) verifying there was not an
       intervening cache update by the user function, 2)
       updating links, 3) updating stats.  During the updates,
       make only one potentially reentrant cache[key]
       assignment after all invariants have been restored.
       Save all decrefs until all other state updates
       have been completed.

A number of features of the lru_cache were designed for space savings over speed (lru is all about eviction to make space for a new entry), for thread safety and to not fall apart during reentrancy. It also provides a guarantee that the hash function is not called more than once per element and is called *before* any of the lru structure updates or lookups (this makes reasoning about correctness *much* easier).

In these capabilities can't be reliably reproduced for 3.5, I think the whole thing should be reverted and deferred to 3.6.
msg245618 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-06-22 04:10
> If the C version is to remain in Python3.5, please make sure it provides all
> of the carefully designed features of the pure python version:
> 
>    * The hash function is called no more than once per element

Will be satisfied by issue24483.

I think all other capabilities are satisfied. The GIL is used to satisfy 
thread-safe.
msg246573 - (view) Author: Stefan Behnel (scoder) * Date: 2015-07-10 17:59
I'm witnessing a crash in the C implementation during garbage collection. Interestingly, it only shows in the Py3.6 branch, not in Py3.5 (both latest). Here's the gdb session:

Program received signal SIGSEGV, Segmentation fault.
lru_cache_tp_traverse (self=0x7ffff2a80ae8, visit=0x43c528 <visit_decref>, arg=0x0) at ./Modules/_functoolsmodule.c:1040
1040            lru_list_elem *next = link->next;
(gdb) list
1035    static int
1036    lru_cache_tp_traverse(lru_cache_object *self, visitproc visit, void *arg)
1037    {
1038        lru_list_elem *link = self->root.next;
1039        while (link != &self->root) {
1040            lru_list_elem *next = link->next;
1041            Py_VISIT(link);
1042            link = next;
1043        }
1044        Py_VISIT(self->maxsize_O);
(gdb) print link
$1 = (lru_list_elem *) 0x0
(gdb) print self
$2 = (lru_cache_object *) 0x7ffff2a80ae8
(gdb) print self->root.next
$3 = (struct lru_list_elem *) 0x0
(gdb) print self->root
$4 = {ob_base = {_ob_next = 0x7ffff2a26458, _ob_prev = 0x90e860 <refchain>, ob_refcnt = 1, ob_type = 0x92c500 <lru_cache_type>}, prev = 0x0, next = 0x0, key = 0x0, result = 0x0}

IIUC correctly, there is only one entry and the code doesn't expect that. An additional "is empty?" NULL check may or may not be the right fix. It might also be the linking itself that's incorrect here. The code seems to expect a cyclic data structure which is apparently not maintained that way.
msg246576 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-10 18:48
self->root.next and self->root.prev should never be NULL. Could you please provide minimal example of code that produces a crash?
msg246580 - (view) Author: Stefan Behnel (scoder) * Date: 2015-07-10 19:46
It's not actually my own code using the lru_cache here. From a quick grep
over the code tree, the only potentially related usage I found was in
Python's fnmatch module, on the "_compile_pattern()" function. Commenting
that out then made the crash go away, so this was the culprit.

However, I ran test_functools.py of the same installation and it passes, so
not every usage is broken here. Simple manual testing didn't reveal
anything either so far.
msg246582 - (view) Author: Stefan Behnel (scoder) * Date: 2015-07-10 19:49
test_fnmatch.py also passes, BTW.
msg246592 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-07-11 03:09
I've also seen a crash in lru_cache_tp_traverse but with the 3.5.0b3 release build for the OS X 64-bit/32-bit installer.  I just stumbled across the segfault by bringing up the interactive interpreter and typing "import ssl".  After a lot of playing around, I reduced the failing case to: 1. have an "import pprint" in a startup file referred to by PYTHONSTARTUP *and* 2. "import ssl" must be the very first command entered in the interactive interpreter. Odd!  Unfortunately, the release build is a non-debug build and, so far, I have not been able to reproduce the segfault with any other build, debug or non-debug.  So, whatever the problem is, it's very build dependent.  Here is the OS X system traceback from the segfault:  

Path:                  /Library/Frameworks/Python.framework/Versions/3.5.0b3_10_6/Resources/Python.app/Contents/MacOS/Python
Identifier:            Python
Version:               ???
Code Type:             X86-64 (Native)
Parent Process:        bash [51285]
Responsible:           iTerm [754]
User ID:               503

Date/Time:             2015-07-10 19:57:16.086 -0700
OS Version:            Mac OS X 10.10.4 (14E46)
Report Version:        11
Anonymous UUID:        CFED52E3-698C-835B-D61C-F4B1F18D2CB6


Time Awake Since Boot: 800000 seconds

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000018

VM Regions Near 0x18:
--> 
    __TEXT                 0000000100000000-0000000100001000 [    4K] r-x/rwx SM=COW  /Library/Frameworks/Python.framework/Versions/3.5.0b3_10_6/Resources/Python.app/Contents/MacOS/Python

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   org.python.python             	0x000000010015e5e5 lru_cache_tp_traverse + 37
1   org.python.python             	0x000000010013a2d7 collect + 439
2   org.python.python             	0x000000010013aee5 _PyObject_GC_Alloc + 357
3   org.python.python             	0x000000010013afe7 _PyObject_GC_New + 23
4   org.python.python             	0x0000000100059bce PyDict_New + 334
5   org.python.python             	0x000000010015f029 lru_cache_new + 249
6   org.python.python             	0x00000001000795a6 type_call + 38
7   org.python.python             	0x000000010000dc93 PyObject_Call + 99
8   org.python.python             	0x00000001000e9fd8 PyEval_EvalFrameEx + 7656
9   org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
10  org.python.python             	0x00000001000f035d PyEval_EvalFrameEx + 33133
11  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
12  org.python.python             	0x00000001000f1e07 PyEval_EvalCodeEx + 71
13  org.python.python             	0x00000001000e5ff5 builtin___build_class__ + 485
14  org.python.python             	0x0000000100065549 PyCFunction_Call + 281
15  org.python.python             	0x00000001000f0768 PyEval_EvalFrameEx + 34168
16  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
17  org.python.python             	0x00000001000f1e61 PyEval_EvalCode + 81
18  org.python.python             	0x00000001000e5683 builtin_exec + 627
19  org.python.python             	0x0000000100065519 PyCFunction_Call + 233
20  org.python.python             	0x00000001000f0a9b PyEval_EvalFrameEx + 34987
21  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
22  org.python.python             	0x00000001000f035d PyEval_EvalFrameEx + 33133
23  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
24  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
25  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
26  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
27  org.python.python             	0x00000001000f1e07 PyEval_EvalCodeEx + 71
28  org.python.python             	0x000000010004017a function_call + 186
29  org.python.python             	0x000000010000dc93 PyObject_Call + 99
30  org.python.python             	0x0000000100010ff6 _PyObject_CallMethodIdObjArgs + 454
31  org.python.python             	0x000000010010d6d3 PyImport_ImportModuleLevelObject + 1171
32  org.python.python             	0x00000001000e5e03 builtin___import__ + 131
33  org.python.python             	0x0000000100065549 PyCFunction_Call + 281
34  org.python.python             	0x000000010000dc93 PyObject_Call + 99
35  org.python.python             	0x00000001000e64f7 PyEval_CallObjectWithKeywords + 87
36  org.python.python             	0x00000001000ea43e PyEval_EvalFrameEx + 8782
37  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
38  org.python.python             	0x00000001000f1e61 PyEval_EvalCode + 81
39  org.python.python             	0x00000001000e5683 builtin_exec + 627
40  org.python.python             	0x0000000100065519 PyCFunction_Call + 233
41  org.python.python             	0x00000001000f0a9b PyEval_EvalFrameEx + 34987
42  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
43  org.python.python             	0x00000001000f035d PyEval_EvalFrameEx + 33133
44  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
45  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
46  org.python.python             	0x00000001000f07fd PyEval_EvalFrameEx + 34317
47  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
48  org.python.python             	0x00000001000f1e07 PyEval_EvalCodeEx + 71
49  org.python.python             	0x000000010004017a function_call + 186
50  org.python.python             	0x000000010000dc93 PyObject_Call + 99
51  org.python.python             	0x0000000100010ff6 _PyObject_CallMethodIdObjArgs + 454
52  org.python.python             	0x000000010010d6d3 PyImport_ImportModuleLevelObject + 1171
53  org.python.python             	0x00000001000e5e03 builtin___import__ + 131
54  org.python.python             	0x0000000100065549 PyCFunction_Call + 281
55  org.python.python             	0x000000010000dc93 PyObject_Call + 99
56  org.python.python             	0x00000001000e64f7 PyEval_CallObjectWithKeywords + 87
57  org.python.python             	0x00000001000ea43e PyEval_EvalFrameEx + 8782
58  org.python.python             	0x00000001000f1d00 _PyEval_EvalCodeWithName + 2400
59  org.python.python             	0x00000001000f1e61 PyEval_EvalCode + 81
60  org.python.python             	0x000000010011fa7a PyRun_InteractiveOneObject + 474
61  org.python.python             	0x000000010011fdfe PyRun_InteractiveLoopFlags + 110
62  org.python.python             	0x000000010012076c PyRun_AnyFileExFlags + 76
63  org.python.python             	0x00000001001390a9 Py_Main + 3785
64  org.python.python             	0x0000000100000e32 0x100000000 + 3634
65  org.python.python             	0x0000000100000c84 0x100000000 + 3204

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x000000010023d6c0  rbx: 0x00000001006fdb70  rcx: 0x0000000100382048  rdx: 0x0000000000000000
  rdi: 0x0000000000000000  rsi: 0x00000001001392e0  rbp: 0x00007fff5bffb530  rsp: 0x00007fff5bffb510
   r8: 0x0000000000000000   r9: 0x0000000000000001  r10: 0x0000000000000016  r11: 0x00000001001392e0
  r12: 0x00000001006fdb88  r13: 0x0000000000000000  r14: 0x00000001001392e0  r15: 0x000000010102fdb8
  rip: 0x000000010015e5e5  rfl: 0x0000000000010206  cr2: 0x0000000000000018
msg246718 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-14 06:23
Stefan, Ned, can you reproduce the same crash with following patch?
msg246720 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-07-14 07:25
Serhiy, I'll try making a 3.5.0b3+patch installer build in the same manner as the 3.5.0b3 installer build.  But I may not get to it for several days.
msg247309 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2015-07-25 02:07
Sorry about the delay in testing the patch.  I just confirmed (1) that I am still able to produce a segfault on OS X as described above under the specific conditions with a 10.6-like installer built with the current 3.5 tip and (2) that, with clru_cache_new.patch applied to the same current 3.5 tip, I no longer see the segfault.  So it looks like a fix.
msg247320 - (view) Author: Roundup Robot (python-dev) Date: 2015-07-25 09:11
New changeset 5345e5ce2eed by Serhiy Storchaka in branch '3.5':
Issue #14373: Fixed segmentation fault when gc.collect() is called during
https://hg.python.org/cpython/rev/5345e5ce2eed

New changeset 9c6d11d22801 by Serhiy Storchaka in branch 'default':
Issue #14373: Fixed segmentation fault when gc.collect() is called during
https://hg.python.org/cpython/rev/9c6d11d22801
msg247321 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-07-25 09:13
Thank you Ned.

Is anything left to do with this issue or it can be closed?
msg253235 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2015-10-20 16:41
I suspect this change is implicated in issue25447.
History
Date User Action Args
2015-11-01 08:32:11serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2015-10-21 07:26:05anacrolixsetnosy: - anacrolix
2015-10-20 16:41:34jason.coombssetstatus: pending -> open
nosy: + jason.coombs
messages: + msg253235

2015-10-06 04:20:50rhettingersetstatus: open -> pending
assignee: serhiy.storchaka
2015-10-06 03:21:13rhettingersetstatus: pending -> open
assignee: rhettinger -> (no value)
2015-07-25 09:13:21serhiy.storchakasetpriority: release blocker -> normal
status: open -> pending
messages: + msg247321
2015-07-25 09:11:39python-devsetmessages: + msg247320
2015-07-25 02:07:22ned.deilysetmessages: + msg247309
2015-07-21 19:46:01serhiy.storchakasetpriority: normal -> release blocker
2015-07-14 07:25:52ned.deilysetmessages: + msg246720
2015-07-14 06:23:54serhiy.storchakasetfiles: + clru_cache_new.patch

messages: + msg246718
2015-07-11 03:09:37ned.deilysetnosy: + ned.deily
messages: + msg246592
2015-07-10 19:49:58Tim.Grahamsetnosy: - Tim.Graham
2015-07-10 19:49:10scodersetmessages: + msg246582
2015-07-10 19:46:03scodersetmessages: + msg246580
2015-07-10 18:48:02serhiy.storchakasetmessages: + msg246576
2015-07-10 17:59:52scodersetmessages: + msg246573
2015-06-22 04:10:51serhiy.storchakasetmessages: + msg245618
2015-06-21 17:37:09rhettingersetmessages: + msg245606
2015-06-08 09:45:16python-devsetmessages: + msg244994
2015-06-08 08:21:11python-devsetmessages: + msg244992
2015-06-07 18:56:06serhiy.storchakasetfiles: + test_lru_cache_threaded.patch

messages: + msg244966
2015-06-07 10:43:37Tim.Grahamsetmessages: + msg244950
2015-06-07 06:44:39serhiy.storchakasetfiles: + clru_cache_descr_get.patch

messages: + msg244944
2015-06-06 20:30:52Tim.Grahamsetnosy: + Tim.Graham
messages: + msg244928
2015-05-24 20:33:07serhiy.storchakasetmessages: + msg244002
2015-05-24 13:26:31serhiy.storchakasetdependencies: + Correct reuse argument tuple in property descriptor
2015-05-24 13:18:47serhiy.storchakasetfiles: + clru_cache_4.patch

messages: + msg243982
2015-05-24 07:13:31serhiy.storchakasetfiles: + clru_cache_3.patch, clru_cache_crasher.py

messages: + msg243970
2015-05-24 04:21:35serhiy.storchakasetmessages: + msg243967
2015-05-24 02:41:23rhettingersetassignee: serhiy.storchaka -> rhettinger
messages: + msg243962
2015-05-23 19:48:36serhiy.storchakasetmessages: + msg243940
2015-05-23 19:43:16python-devsetnosy: + python-dev
messages: + msg243939
2015-05-23 17:15:17rhettingersetassignee: rhettinger -> serhiy.storchaka
messages: + msg243932
2015-04-10 18:10:54kmikesetnosy: + kmike
2014-10-14 15:44:57skrahsetnosy: - skrah
2014-08-07 01:51:22pitrousetmessages: + msg224981
2014-08-07 01:19:01rhettingersetmessages: + msg224980
2014-08-06 15:34:01pitrousetmessages: + msg224941
2014-08-06 15:30:20pitrousetpriority: low -> normal
2014-08-06 15:22:45serhiy.storchakasetmessages: + msg224938
2014-07-18 21:42:01BreamoreBoysetnosy: + BreamoreBoy
messages: + msg223439
2014-06-26 19:12:45Aaron.Meurersetnosy: + Aaron.Meurer
2014-04-07 16:14:56pokesetnosy: + poke
2014-03-20 11:57:03rhettingersetpriority: high -> low

messages: + msg214209
2014-03-20 11:51:15serhiy.storchakasetmessages: + msg214208
2014-03-06 22:38:45josh.rsetnosy: + josh.r
2013-11-22 21:48:52serhiy.storchakasetfiles: + clru_cache2.patch

messages: + msg203894
2013-11-22 19:42:59pitrousetmessages: + msg203846
2013-11-22 17:49:54serhiy.storchakasetpriority: low -> high
files: + clru_cache.patch
messages: + msg203821
2013-11-21 16:43:43rhettingersetpriority: normal -> low

messages: + msg203654
versions: + Python 3.5, - Python 3.4
2013-11-21 09:45:30serhiy.storchakasetmessages: + msg203599
2013-11-21 09:34:58brechtmsetnosy: + brechtm
messages: + msg203597
2013-02-16 00:39:48jceasetnosy: + jcea
2012-12-30 20:04:12kachayevsetfiles: + 14373.v9.diff

messages: + msg178611
2012-12-30 17:16:04serhiy.storchakasetmessages: + msg178589
2012-12-30 13:44:02kachayevsetfiles: + 14373.v7.diff

messages: + msg178573
2012-12-23 16:20:54scodersetmessages: + msg177993
2012-12-23 12:40:00skrahsetmessages: + msg177980
2012-12-23 12:10:56skrahsetnosy: + skrah
messages: + msg177976
2012-12-23 10:08:24scodersetnosy: + scoder
messages: + msg177967
2012-12-22 21:01:00serhiy.storchakasetmessages: + msg177955
stage: commit review -> patch review
2012-12-22 20:21:15serhiy.storchakasetfiles: + lru_cache_bench.py

messages: + msg177953
stage: needs patch -> commit review
2012-12-21 22:13:52kachayevsetfiles: + 14373.v4.diff

messages: + msg177903
2012-12-21 19:38:44kachayevsetfiles: + 14373.v3.diff

messages: + msg177889
2012-12-21 08:05:05kachayevsetmessages: + msg177867
2012-12-20 11:57:36nedbatsetnosy: - nedbat
2012-12-20 10:04:57serhiy.storchakasetmessages: + msg177808
stage: patch review -> needs patch
2012-12-20 09:01:41serhiy.storchakasetstage: needs patch -> patch review
2012-12-19 22:10:44asvetlovsetnosy: + amaury.forgeotdarc, pitrou
2012-12-19 21:54:18kachayevsetfiles: + 14373.fixed.diff
nosy: + kachayev
messages: + msg177784

2012-11-24 14:19:26serhiy.storchakasetmessages: + msg176287
components: + Extension Modules, - Interpreter Core
stage: patch review -> needs patch
2012-11-14 10:45:09asvetlovsetnosy: + asvetlov
2012-11-14 08:48:35serhiy.storchakasetnosy: + serhiy.storchaka
2012-11-14 07:20:34ezio.melottisetmessages: + msg175545
2012-11-14 07:06:11ezio.melottisetmessages: + msg175540
2012-11-14 04:16:12anacrolixsetmessages: + msg175533
2012-11-13 05:13:09ezio.melottisetnosy: + ezio.melotti
stage: patch review

versions: + Python 3.4, - Python 3.3
2012-11-13 03:47:59eric.snowsetnosy: + eric.snow
2012-04-02 07:20:41anacrolixsetfiles: - functools.lru_cache-in-c.patch
2012-04-02 07:20:31anacrolixsetfiles: - functools.lru_cache-in-c.patch
2012-04-02 07:20:15anacrolixsetfiles: + functools.lru_cache-in-c.patch

messages: + msg157343
2012-03-31 20:10:34rhettingersetfiles: - lru_cache_class.py
2012-03-31 20:10:00rhettingersetfiles: + lru_cache_class.py

messages: + msg157226
2012-03-31 19:50:28rhettingersetfiles: + lru_cache_class.py

messages: + msg157224
2012-03-26 10:54:47anacrolixsetfiles: - functools.lru_cache-in-c
2012-03-26 10:53:33anacrolixsetfiles: + functools.lru_cache-in-c.patch

messages: + msg156808
2012-03-26 08:29:55rhettingersetmessages: + msg156803
2012-03-25 18:57:06anacrolixsetnosy: + nedbat
2012-03-21 14:52:09meador.ingesetnosy: + meador.inge
2012-03-21 12:10:43anacrolixsetfiles: + functools.lru_cache-in-c.patch
keywords: + patch
messages: + msg156492
2012-03-20 19:14:14rhettingersetassignee: rhettinger
messages: + msg156449
2012-03-20 18:30:15giampaolo.rodolasetnosy: + giampaolo.rodola
2012-03-20 14:14:47anacrolixsetfiles: + functools.lru_cache-in-c
2012-03-20 13:40:44anacrolixcreate