classification
Title: Expose cache validity checking support in ABCMeta
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: lukasz.langa Nosy List: daniel.urban, eric.araujo, lukasz.langa, ncoghlan, pitrou, pje, python-dev, r.david.murray
Priority: normal Keywords: patch

Created on 2013-01-01 03:38 by ncoghlan, last changed 2013-12-24 21:15 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
issue16832.diff lukasz.langa, 2013-05-25 16:27 review
Messages (22)
msg178725 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-01-01 03:38
ABCMeta uses an internal counter to invalidate the negative caches when a register() call changes the virtual inheritance graph. (A global count of register() calls is stored on ABCMeta, which ABCMeta.__instancecheck__ and ABCMeta.__subclasscheck__ then compare to a copy cached on the individual ABC)

To properly handle register() method calls on ABCs, generic function implementations also need a way to invalidate their own caches when that graph changes.

It seems like the simplest way to handle this would be to expose a read-only "ABCMeta.cache_token" property. Generic function implementations could then work the same way as ABCMeta itself: store explicit implementation registrations and a cache of derived implementations separately, and if a saved copy of the cache token doesn't match the current value of ABCMeta.cache_token, clear the derived cache.
msg189964 - (view) Author: Łukasz Langa (lukasz.langa) (Python committer) Date: 2013-05-25 15:08
This will be useful for PEP 443.

From Nick's comment on python-dev (http://mail.python.org/pipermail/python-dev/2013-May/126535.html):

"Given the global nature of the cache invalidation, it may be better as
a module level abc.get_cache_token() function."
msg189966 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 15:27
Rather than exposing the "cache token" (which looks like an implementation detail), you may allow third-party code to register a handler which will be called when an ABC's registrations are modified:

def abc_handler(abc):
    """
    Called when the concrete class registrations for ABC `abc`
    are updated.
    """

or even:

def abc_handler(abc, added, removed):
    """
    Called when the concrete class registrations for ABC `abc`
    are updated. `added` is an iterable of concrete classes which
    have been registered on the ABC, `removed` is an iterable of
    concrete classes which have been unregistered.
    """
msg189971 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-25 15:48
I thought about that originally, but there's only ever one object graph for the process, and as soon as you break any one edge in that graph you pretty much invalidate anything based on caching traversal results. (More accurately: it's almost always going to be cheaper to blow away and rebuild your cache than it is to determine whether or not your cache was actually affected by the graph change, particularly given the fact that object graph changes will almost always happen while the caches are still cold during application startup).

So it's not really an implementation detail, it's a promise to provide a unique token for the current state of the object graph that can be used reliably for cache invalidation.

Thus, I favour exposing a cache token as the simplest thing that could possibly work - building a mechanism for "tell me when the object graph changes" is a complex solution when the only question we need to answer is "is my cache still valid?".

So the recommended idiom with this design would be:

    new_token = abc.get_cache_token()
    if new_token != cached_token:
        cache.clear()
        cached_token = new_token
    else:
        # Do something with the cache
    # Handle a cache miss

A callback based system is actually *harder* to use, because you can't simply ask the question "Is my cache still valid?" - you have to register a callback that sets a flag, or something similar. Your lookup code ends up being:

    if cache_invalidated:
        cache.clear()
        cache_invalidated = False
    else:
        # Do something with the cache
    # Handle a cache miss

However, now you have a potential problem, because your state update (setting the flag) is decoupled from checking the flag. That makes it harder to ensure correct sequencing than is the case with a simple inline check of a cache validity token.

Choosing a solution that is harder to implement and harder to use is definitely not a good idea :)
msg189973 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-25 15:49
And when I say "originally" I mean "after I saw how ABCs already implemented this capability" :)
msg189974 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 15:57
> I thought about that originally, but there's only ever one object
> graph for the process, and as soon as you break any one edge in that
> graph you pretty much invalidate anything based on caching traversal
> results.

Ah, I was forgetting that a ABC registration also affects the subclasses
of the registered class, and the abstract parents of the ABC class.
So it's a bit more annoying than it looked like.

> So it's not really an implementation detail, it's a promise to provide
> a unique token for the current state of the object graph that can be
> used reliably for cache invalidation.

Sounds fair.

> A callback based system is actually *harder* to use, because you can't
> simply ask the question "Is my cache still valid?" - you have to
> register a callback that sets a flag, or something similar. Your
> lookup code ends up being:

Well, the point of a callback based system is for the callback to
invalidate or recompute the cache *right now*, so you don't have to
"check a flag" each time you do a lookup.

It's a basic cost/benefit compromise: the callback system makes
invalidations more expensive, but lookups cheapers.

Besides, a callback-based system can be more future-proof than exposing
the raw cache tags or tokens.
msg189976 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-25 16:20
Ah, but that's the other trick: we *don't know* if we need to recalculate our whole cache when the object graph changes.

1. Some of the cached entries may never be accessed again, so recalculating them will be a waste
2. The object graph may change again before they're next accessed, so recalculating any entries at all will be waste

The cache invalidation design in the ABCs is a smart compromise, since it leaves recreating the cache entries to the last possible moment: the next time a type is looked up after the object graph has changed. It doesn't matter if there is one graph change or a thousand in that time, we only do the fresh lookup once. And if the type is never looked up again, we don't even need to do that much.

In a typical application, the expected usage model for both ABCs and generic functions is the same:

1. During startup, the object graph and dispatch resolution results are changing rapidly as concrete classes and implementations get registered with ABCs and generic functions. Actual instance checks and generic function dispatch operations are rare, so the caches mostly remain cold.
2. Once the application is up and running, the object graph stabilises, and the caches of significance to the application start to become populated through actual use. Absent live code reloading, the caches then remain valid for the lifetime of the application.

The ABC design takes the view that the extra runtime check for cache validity in the absence of direct inheritance or registration is worth it to avoid repeated cache clearing during application startup for ABCs that may never even be used by the application.

I believe exactly the same reasoning holds true for generic functions: triggering callbacks when the object graph changes means we may end up doing a lot of work clearing caches that the application isn't even using. Delaying the check to call time means that only the caches that matter will ever be touched.
msg189977 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 16:24
> 1. Some of the cached entries may never be accessed again, so
> recalculating them will be a waste
> 2. The object graph may change again before they're next accessed, so
> recalculating any entries at all will be waste

Yup, hence the "cost/benefit compromise" ;-)

My assumption here was that ABC registrations change quite infrequently
compared to the invocations of generic functions, so transferring most
of the cost to registration-time (rather than invocation-time) would be
a benefit.
msg189978 - (view) Author: Łukasz Langa (lukasz.langa) (Python committer) Date: 2013-05-25 16:28
Review the patch, please. Committing in 10... 9... 8...
msg189980 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-25 16:42
New changeset 5b17d5ca2ff1 by Łukasz Langa in branch 'default':
Fix #16832 - expose cache validity checking support in ABCMeta
http://hg.python.org/cpython/rev/5b17d5ca2ff1
msg189981 - (view) Author: Roundup Robot (python-dev) Date: 2013-05-25 16:48
New changeset d9828c438889 by Łukasz Langa in branch 'default':
Mention issue #16832 in Misc/NEWS
http://hg.python.org/cpython/rev/d9828c438889
msg189982 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2013-05-25 16:54
Please expose this as an attribute of the class or module, not as a function.  A function is orders of magnitude slower than attribute access, and the entire point of exposing this is to allow caches to be invalidated.

In order to be useful for cache invalidation, the token has to be read on *every* access to a cache, thus adding unnecessary overhead to every cache access.
msg189983 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 16:56
> Please expose this as an attribute of the class or module, not as a
> function.  A function is orders of magnitude slower than attribute
> access, and the entire point of exposing this is to allow caches to be
> invalidated.

-1. Exposing a function allows to modify the underlying implementation
without breaking any API.
msg189985 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-25 17:13
Trading correctness for speed is almost never a good idea. If people are worried about speed to that level, they can either bypass the public API and access the private attribute directly (after profiling their application to ensure the cache validity checks are the bottleneck), or else migrate to PyPy to eliminate the function call overhead (amongst many other speed improvements).
msg189990 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2013-05-25 19:02
Antoine Pitrou added the comment:
> -1. Exposing a function allows to modify the underlying implementation
> without breaking any API.

This doesn't make any sense.  Once you've exposed an API that gives
out a value for this, you can't change the implementation in a way
that doesn't involve handing out a value... in which case you can just
as easily set it as an attribute.

So there is actually zero improvement in encapsulation: it's purely a
ritualistic wrapping, like Java programmers insisting on having
getFoo() methods in Python when an attribute would suffice.  If there
must be a way to change it later to be dynamic, it can always be
exposed as an attribute of an object that could grow a property later,
e.g.

    from abc import object_graph

    if object_graph.version != old_version:
        ...

Nick Coghlan added the comment:
> Trading correctness for speed is almost never a good idea.

How is "correctness" relevant to the question of whether a readable
value is exposed as an attribute or a method?
msg189993 - (view) Author: PJ Eby (pje) * (Python committer) Date: 2013-05-25 19:16
All that being said, I took out some time to get actual numbers, and
found my original guesstimate of overhead was incorrect; it's only 3
times slower, not "orders of magnitude".  ;-)

And even on a relatively old machine, that 3 times slower amounts to
an extra tenth of a microsecond, at least in a hot loop with timeit.
So I'll stop being a pain about this now, even though I still think
getters are a stinky Java code smell and don't belong in Python
anyways.  ;-)
msg189994 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-05-25 19:18
> This doesn't make any sense.  Once you've exposed an API that gives
> out a value for this, you can't change the implementation in a way
> that doesn't involve handing out a value... in which case you can just
> as easily set it as an attribute.

Because it may be computed whenever the function call is done, rather
than stored statically somewhere. The caching scheme will not
necessarily remain forever identical.

> If there
> must be a way to change it later to be dynamic, it can always be
> exposed as an attribute of an object that could grow a property later,
> e.g.

That's a possibility indeed.
msg190043 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-26 00:52
The reason I switched from suggesting an attribute/property to a module level function is because we don't really support properties for process global state. That's why sys has so many getters in it - to make them properties, you would have to add a class object to act as a holder for the property.

In this case, we appear to have a singleton that could hold the property (ABCMeta), which is why my original suggestion was to use an ABCMeta attribute. However, what I realised yesterday is that because ABCMeta is a type subclass, it means the only way to make the attribute a property in the future would be to add a metametaclass to hold the property definition.

Once I realised that... "if the implementation is hard to explain, it's a bad idea" tipped me over the edge into preferring a simple module level getter function that supported exactly the use cases we care about :)
msg190050 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2013-05-26 02:56
About future-proofing: do we want to document that the token is an opaque integer (as is done now) or just an opaque object that supports equality comparisons?
msg190062 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-05-26 04:27
The latter is probably better (it should just need a slight tweak to
the wording in the docs and docstring)
msg206912 - (view) Author: Roundup Robot (python-dev) Date: 2013-12-24 21:14
New changeset a9f73b44ea0e by R David Murray in branch 'default':
#16832: s/integer/object/ in docs/docstring, and add whatsnew entry.
http://hg.python.org/cpython/rev/a9f73b44ea0e
msg206913 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-12-24 21:15
I don't see that there's anything other than the doc change that needs done here, so I'm closing this.
History
Date User Action Args
2013-12-24 21:15:47r.david.murraysetstatus: open -> closed

type: enhancement

nosy: + r.david.murray
messages: + msg206913
resolution: fixed
stage: resolved
2013-12-24 21:14:08python-devsetmessages: + msg206912
2013-05-26 04:27:54ncoghlansetmessages: + msg190062
2013-05-26 02:56:28eric.araujosetnosy: + eric.araujo
messages: + msg190050
2013-05-26 00:52:26ncoghlansetmessages: + msg190043
2013-05-25 19:18:12pitrousetmessages: + msg189994
2013-05-25 19:16:27pjesetmessages: + msg189993
2013-05-25 19:02:55pjesetmessages: + msg189990
2013-05-25 17:13:29ncoghlansetmessages: + msg189985
2013-05-25 16:56:13pitrousetmessages: + msg189983
2013-05-25 16:54:28pjesetnosy: + pje
messages: + msg189982
2013-05-25 16:48:36python-devsetmessages: + msg189981
2013-05-25 16:42:14python-devsetnosy: + python-dev
messages: + msg189980
2013-05-25 16:28:18lukasz.langasetmessages: + msg189978
2013-05-25 16:27:15lukasz.langasetkeywords: + patch
files: + issue16832.diff
2013-05-25 16:24:03pitrousetmessages: + msg189977
2013-05-25 16:20:41ncoghlansetmessages: + msg189976
2013-05-25 15:57:48pitrousetmessages: + msg189974
2013-05-25 15:49:35ncoghlansetmessages: + msg189973
2013-05-25 15:48:32ncoghlansetmessages: + msg189971
2013-05-25 15:27:04pitrousetnosy: + pitrou
messages: + msg189966
2013-05-25 15:08:14lukasz.langasetassignee: lukasz.langa

messages: + msg189964
nosy: + lukasz.langa
2013-01-01 09:02:40daniel.urbansetnosy: + daniel.urban
2013-01-01 03:38:24ncoghlancreate