Message 334029 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger, serhiy.storchaka
Date	2019-01-19.04:19:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1547871578.67.0.302934084247.issue35780@roundup.psfhosted.org>
In-reply-to

Content
After the check for popresult==Py_None, there is the comment that was mostly copied from the Python version but doesn't match the actual code: /* Getting here means that this same key was added to the cache while the lock was released. Since the link update is already done, we need only return the computed result and update the count of misses. / The cache.pop uses the old key (the one being evicted), so at this point in the code we have an extracted link containing the old key but the pop failed to find the dictionary reference to that link. It tells us nothing about whether the current new key has already been added to the cache or whether another thread added a different key. This code path doesn't add the new key. Also, it leaves the self->full variable set to True even though we're now at least one link short of maxsize. The next test is for popresult == NULL. If I'm understanding it correctly, it means that an error occurred during lookup (possible during the equality check). If so, then why is the link being moved to the front of the lru_cache -- it should have remained at the oldest position. The solution to this is only extract the link after a successful pop rather than before. The final case runs code when the pop succeeded in finding the oldest link. The popresult is decreffed but not checked to make sure that it actually is the oldest link. Afterwards, _PyDict_SetItem_KnownHash() is called with the new key. Unlike the pure python code, it does not check to see if the new key has already been added by another thread. This can result in an orphaned link (a link not referred to by the cache dict). I think that is why popresult code can ever get to a state where it can return Py_None (it means that the cache structure is in an inconsistent state). I think the fix is to make the code more closely follow the pure python code. Verify that the new key hasn't been added by another thread during the user function call. Don't delete the old link until it has been successfully popped. A Py_None return from the pop should be regarded as sign the structure is in an inconsistent state. The self->full variable needed to be reset if there are any code paths that deletes links but don't add them back. Better yet, the extraction of a link should be immediately followed by repopulating it will new values and moving it to the front of the cache. That way, the cache structure will always remain in a consistent state and the number of links will be constant from start to finish. The current code likely doesn't fail in any spectacular way. Instead, it will occasionally have unreferenced orphan links, will occasionally be marked as full when it is short one or more links (and never regaining the lost links), will occasionally not put the result of the newest function call into the cache, and will occasionally mark the oldest link as being the newest even though there wasn't a user function call to the corresponding old key. Minor nit: The decrefs should be done at the bottom of each code path instead of the top. This makes it a lot easier to verify that we aren't making arbitrary reentrant callbacks until the cache data structures have been put into a consistent state. Minor nit: The test "self->root.next != &self->root" may no longer be necessary if the above issues are fixed. We can only get to this wrapper when maxsize > 0, so self->full being true implies that there is at least one link in the chain, so self->root.next cannot point back to itself. Possibly the need for this test exists only because the cache is getting into an inconsistent state where it is marked as full but there aren't any extant links. Minor nit: "lru_cache_extricate_link" should be named "lru_cache_extract_link". The word "extricate" applies only when solving an error case; whereas, "extract" applies equally well to normal cases and cases. The latter word more closely means "remove an object from a data structure" which is what was likely intended. Another minor nit: The code in lru_cache_append_link() is written in way where the compiler has to handle an impossible case where "link->prev->next = link->next" changes the value of "link->next". The suspicion of aliased pointers causes the compiler to generate an unnecessary and redundant memory fetch. The solution again is to more closely follow the pure python code: diff --git a/Modules/_functoolsmodule.c b/Modules/_functoolsmodule.c index 0fb4847af9..8cbd79ceaf 100644 --- a/Modules/_functoolsmodule.c +++ b/Modules/_functoolsmodule.c @@ -837,8 +837,10 @@ infinite_lru_cache_wrapper(lru_cache_object self, PyObject args, PyObject kwd static void lru_cache_extricate_link(lru_list_elem link) { - link->prev->next = link->next; - link->next->prev = link->prev; + lru_list_elem link_prev = link->prev; + lru_list_elem *link_next = link->next; + link_prev->next = link->next; + link_next->prev = link->prev; } Clang assembly before: movq 16(%rax), %rcx # link->prev movq 24(%rax), %rdx # link->next movq %rdx, 24(%rcx) # link->prev->next = link->next; movq 24(%rax), %rdx # duplicate fetch of link->next movq %rcx, 16(%rdx) # link->next->prev = link->prev; Clang assembly after: movq 16(%rax), %rcx movq 24(%rax), %rdx movq %rdx, 24(%rcx) movq %rcx, 16(%rdx) Open question: Is the any part of the code that relies on the cache key being a tuple? If not, would it be reasonable to emulate the pure python code and return a scalar instead of a tuple when the tuple length is one and there are no keyword arguments or typing requirements? In other words, does f(1) need to have a key of (1,) instead of just 1? It would be nice to save a little space (for the enclosing tuple) and get a little speed (hash the object directly instead of hashing a tuple with just one object).

After the check for popresult==Py_None, there is the comment that was mostly copied from the Python version but doesn't match the actual code:

 /* Getting here means that this same key was added to the
    cache while the lock was released.  Since the link
    update is already done, we need only return the
    computed result and update the count of misses. */

The cache.pop uses the old key (the one being evicted), so at this point in the code we have an extracted link containing the old key but the pop failed to find the dictionary reference to that link.  It tells us nothing about whether the current new key has already been added to the cache or whether another thread added a different key. This code path doesn't add the new key.  Also, it leaves the self->full variable set to True even though we're now at least one link short of maxsize.

The next test is for popresult == NULL. If I'm understanding it correctly, it means that an error occurred during lookup (possible during the equality check).  If so, then why is the link being moved to the front of the lru_cache -- it should have remained at the oldest position.  The solution to this is only extract the link after a successful pop rather than before.

The final case runs code when the pop succeeded in finding the oldest link.  The popresult is decreffed but not checked to make sure that it actually is the oldest link.  Afterwards, _PyDict_SetItem_KnownHash() is called with the new key.  Unlike the pure python code, it does not check to see if the new key has already been added by another thread.  This can result in an orphaned link (a link not referred to by the cache dict).  I think that is why popresult code can ever get to a state where it can return Py_None (it means that the cache structure is in an inconsistent state).

I think the fix is to make the code more closely follow the pure python code.  Verify that the new key hasn't been added by another thread during the user function call.  Don't delete the old link until it has been successfully popped.  A Py_None return from the pop should be regarded as sign the structure is in an inconsistent state.  The self->full variable needed to be reset if there are any code paths that deletes links but don't add them back.  Better yet, the extraction of a link should be immediately followed by repopulating it will new values and moving it to the front of the cache.  That way, the cache structure will always remain in a consistent state and the number of links will be constant from start to finish.

The current code likely doesn't fail in any spectacular way.  Instead, it will occasionally have unreferenced orphan links, will occasionally be marked as full when it is short one or more links (and never regaining the lost links), will occasionally not put the result of the newest function call into the cache, and will occasionally mark the oldest link as being the newest even though there wasn't a user function call to the corresponding old key.

Minor nit:  The decrefs should be done at the bottom of each code path instead of the top.  This makes it a lot easier to verify that we aren't making arbitrary reentrant callbacks until the cache data structures have been put into a consistent state.

Minor nit: The test "self->root.next != &self->root" may no longer be necessary if the above issues are fixed.  We can only get to this wrapper when maxsize > 0, so self->full being true implies that there is at least one link in the chain, so self->root.next cannot point back to itself.  Possibly the need for this test exists only because the cache is getting into an inconsistent state where it is marked as full but there aren't any extant links.

Minor nit:  "lru_cache_extricate_link" should be named "lru_cache_extract_link".  The word "extricate" applies only when solving an error case; whereas, "extract" applies equally well to normal cases and cases.  The latter word more closely means "remove an object from a data structure" which is what was likely intended.

Another minor nit:  The code in lru_cache_append_link() is written in way where the compiler has to handle an impossible case where "link->prev->next = link->next" changes the value of "link->next".  The suspicion of aliased pointers causes the compiler to generate an unnecessary and redundant memory fetch.   The solution again is to more closely follow the pure python code:

diff --git a/Modules/_functoolsmodule.c b/Modules/_functoolsmodule.c
index 0fb4847af9..8cbd79ceaf 100644
--- a/Modules/_functoolsmodule.c
+++ b/Modules/_functoolsmodule.c
@@ -837,8 +837,10 @@ infinite_lru_cache_wrapper(lru_cache_object *self, PyObject *args, PyObject *kwd
 static void
 lru_cache_extricate_link(lru_list_elem *link)
 {
-    link->prev->next = link->next;
-    link->next->prev = link->prev;
+    lru_list_elem *link_prev = link->prev;
+    lru_list_elem *link_next = link->next;
+    link_prev->next = link->next;
+    link_next->prev = link->prev;
 }

    Clang assembly before:

        movq    16(%rax), %rcx      # link->prev
        movq    24(%rax), %rdx      # link->next
        movq    %rdx, 24(%rcx)      # link->prev->next = link->next;
        movq    24(%rax), %rdx      # duplicate fetch of link->next
        movq    %rcx, 16(%rdx)      # link->next->prev = link->prev;

    Clang assembly after:

        movq    16(%rax), %rcx
        movq    24(%rax), %rdx
        movq    %rdx, 24(%rcx)
        movq    %rcx, 16(%rdx)

Open question:  Is the any part of the code that relies on the cache key being a tuple?  If not, would it be reasonable to emulate the pure python code and return a scalar instead of a tuple when the tuple length is one and there are no keyword arguments or typing requirements?  In other words, does f(1) need to have a key of (1,) instead of just 1?  It would be nice to save a little space (for the enclosing tuple) and get a little speed (hash the object directly instead of hashing a tuple with just one object).

History
Date	User	Action	Args
2019-01-19 04:19:42	rhettinger	set	recipients: + rhettinger, serhiy.storchaka
2019-01-19 04:19:38	rhettinger	set	messageid: <1547871578.67.0.302934084247.issue35780@roundup.psfhosted.org>
2019-01-19 04:19:38	rhettinger	link	issue35780 messages
2019-01-19 04:19:38	rhettinger	create