Issue16832
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-01-01 03:38 by ncoghlan, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
issue16832.diff | lukasz.langa, 2013-05-25 16:27 | review |
Messages (22) | |||
---|---|---|---|
msg178725 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-01-01 03:38 | |
ABCMeta uses an internal counter to invalidate the negative caches when a register() call changes the virtual inheritance graph. (A global count of register() calls is stored on ABCMeta, which ABCMeta.__instancecheck__ and ABCMeta.__subclasscheck__ then compare to a copy cached on the individual ABC) To properly handle register() method calls on ABCs, generic function implementations also need a way to invalidate their own caches when that graph changes. It seems like the simplest way to handle this would be to expose a read-only "ABCMeta.cache_token" property. Generic function implementations could then work the same way as ABCMeta itself: store explicit implementation registrations and a cache of derived implementations separately, and if a saved copy of the cache token doesn't match the current value of ABCMeta.cache_token, clear the derived cache. |
|||
msg189964 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2013-05-25 15:08 | |
This will be useful for PEP 443. From Nick's comment on python-dev (http://mail.python.org/pipermail/python-dev/2013-May/126535.html): "Given the global nature of the cache invalidation, it may be better as a module level abc.get_cache_token() function." |
|||
msg189966 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-05-25 15:27 | |
Rather than exposing the "cache token" (which looks like an implementation detail), you may allow third-party code to register a handler which will be called when an ABC's registrations are modified: def abc_handler(abc): """ Called when the concrete class registrations for ABC `abc` are updated. """ or even: def abc_handler(abc, added, removed): """ Called when the concrete class registrations for ABC `abc` are updated. `added` is an iterable of concrete classes which have been registered on the ABC, `removed` is an iterable of concrete classes which have been unregistered. """ |
|||
msg189971 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-25 15:48 | |
I thought about that originally, but there's only ever one object graph for the process, and as soon as you break any one edge in that graph you pretty much invalidate anything based on caching traversal results. (More accurately: it's almost always going to be cheaper to blow away and rebuild your cache than it is to determine whether or not your cache was actually affected by the graph change, particularly given the fact that object graph changes will almost always happen while the caches are still cold during application startup). So it's not really an implementation detail, it's a promise to provide a unique token for the current state of the object graph that can be used reliably for cache invalidation. Thus, I favour exposing a cache token as the simplest thing that could possibly work - building a mechanism for "tell me when the object graph changes" is a complex solution when the only question we need to answer is "is my cache still valid?". So the recommended idiom with this design would be: new_token = abc.get_cache_token() if new_token != cached_token: cache.clear() cached_token = new_token else: # Do something with the cache # Handle a cache miss A callback based system is actually *harder* to use, because you can't simply ask the question "Is my cache still valid?" - you have to register a callback that sets a flag, or something similar. Your lookup code ends up being: if cache_invalidated: cache.clear() cache_invalidated = False else: # Do something with the cache # Handle a cache miss However, now you have a potential problem, because your state update (setting the flag) is decoupled from checking the flag. That makes it harder to ensure correct sequencing than is the case with a simple inline check of a cache validity token. Choosing a solution that is harder to implement and harder to use is definitely not a good idea :) |
|||
msg189973 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-25 15:49 | |
And when I say "originally" I mean "after I saw how ABCs already implemented this capability" :) |
|||
msg189974 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-05-25 15:57 | |
> I thought about that originally, but there's only ever one object > graph for the process, and as soon as you break any one edge in that > graph you pretty much invalidate anything based on caching traversal > results. Ah, I was forgetting that a ABC registration also affects the subclasses of the registered class, and the abstract parents of the ABC class. So it's a bit more annoying than it looked like. > So it's not really an implementation detail, it's a promise to provide > a unique token for the current state of the object graph that can be > used reliably for cache invalidation. Sounds fair. > A callback based system is actually *harder* to use, because you can't > simply ask the question "Is my cache still valid?" - you have to > register a callback that sets a flag, or something similar. Your > lookup code ends up being: Well, the point of a callback based system is for the callback to invalidate or recompute the cache *right now*, so you don't have to "check a flag" each time you do a lookup. It's a basic cost/benefit compromise: the callback system makes invalidations more expensive, but lookups cheapers. Besides, a callback-based system can be more future-proof than exposing the raw cache tags or tokens. |
|||
msg189976 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-25 16:20 | |
Ah, but that's the other trick: we *don't know* if we need to recalculate our whole cache when the object graph changes. 1. Some of the cached entries may never be accessed again, so recalculating them will be a waste 2. The object graph may change again before they're next accessed, so recalculating any entries at all will be waste The cache invalidation design in the ABCs is a smart compromise, since it leaves recreating the cache entries to the last possible moment: the next time a type is looked up after the object graph has changed. It doesn't matter if there is one graph change or a thousand in that time, we only do the fresh lookup once. And if the type is never looked up again, we don't even need to do that much. In a typical application, the expected usage model for both ABCs and generic functions is the same: 1. During startup, the object graph and dispatch resolution results are changing rapidly as concrete classes and implementations get registered with ABCs and generic functions. Actual instance checks and generic function dispatch operations are rare, so the caches mostly remain cold. 2. Once the application is up and running, the object graph stabilises, and the caches of significance to the application start to become populated through actual use. Absent live code reloading, the caches then remain valid for the lifetime of the application. The ABC design takes the view that the extra runtime check for cache validity in the absence of direct inheritance or registration is worth it to avoid repeated cache clearing during application startup for ABCs that may never even be used by the application. I believe exactly the same reasoning holds true for generic functions: triggering callbacks when the object graph changes means we may end up doing a lot of work clearing caches that the application isn't even using. Delaying the check to call time means that only the caches that matter will ever be touched. |
|||
msg189977 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-05-25 16:24 | |
> 1. Some of the cached entries may never be accessed again, so > recalculating them will be a waste > 2. The object graph may change again before they're next accessed, so > recalculating any entries at all will be waste Yup, hence the "cost/benefit compromise" ;-) My assumption here was that ABC registrations change quite infrequently compared to the invocations of generic functions, so transferring most of the cost to registration-time (rather than invocation-time) would be a benefit. |
|||
msg189978 - (view) | Author: Łukasz Langa (lukasz.langa) * | Date: 2013-05-25 16:28 | |
Review the patch, please. Committing in 10... 9... 8... |
|||
msg189980 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-05-25 16:42 | |
New changeset 5b17d5ca2ff1 by Łukasz Langa in branch 'default': Fix #16832 - expose cache validity checking support in ABCMeta http://hg.python.org/cpython/rev/5b17d5ca2ff1 |
|||
msg189981 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-05-25 16:48 | |
New changeset d9828c438889 by Łukasz Langa in branch 'default': Mention issue #16832 in Misc/NEWS http://hg.python.org/cpython/rev/d9828c438889 |
|||
msg189982 - (view) | Author: PJ Eby (pje) * | Date: 2013-05-25 16:54 | |
Please expose this as an attribute of the class or module, not as a function. A function is orders of magnitude slower than attribute access, and the entire point of exposing this is to allow caches to be invalidated. In order to be useful for cache invalidation, the token has to be read on *every* access to a cache, thus adding unnecessary overhead to every cache access. |
|||
msg189983 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-05-25 16:56 | |
> Please expose this as an attribute of the class or module, not as a > function. A function is orders of magnitude slower than attribute > access, and the entire point of exposing this is to allow caches to be > invalidated. -1. Exposing a function allows to modify the underlying implementation without breaking any API. |
|||
msg189985 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-25 17:13 | |
Trading correctness for speed is almost never a good idea. If people are worried about speed to that level, they can either bypass the public API and access the private attribute directly (after profiling their application to ensure the cache validity checks are the bottleneck), or else migrate to PyPy to eliminate the function call overhead (amongst many other speed improvements). |
|||
msg189990 - (view) | Author: PJ Eby (pje) * | Date: 2013-05-25 19:02 | |
Antoine Pitrou added the comment: > -1. Exposing a function allows to modify the underlying implementation > without breaking any API. This doesn't make any sense. Once you've exposed an API that gives out a value for this, you can't change the implementation in a way that doesn't involve handing out a value... in which case you can just as easily set it as an attribute. So there is actually zero improvement in encapsulation: it's purely a ritualistic wrapping, like Java programmers insisting on having getFoo() methods in Python when an attribute would suffice. If there must be a way to change it later to be dynamic, it can always be exposed as an attribute of an object that could grow a property later, e.g. from abc import object_graph if object_graph.version != old_version: ... Nick Coghlan added the comment: > Trading correctness for speed is almost never a good idea. How is "correctness" relevant to the question of whether a readable value is exposed as an attribute or a method? |
|||
msg189993 - (view) | Author: PJ Eby (pje) * | Date: 2013-05-25 19:16 | |
All that being said, I took out some time to get actual numbers, and found my original guesstimate of overhead was incorrect; it's only 3 times slower, not "orders of magnitude". ;-) And even on a relatively old machine, that 3 times slower amounts to an extra tenth of a microsecond, at least in a hot loop with timeit. So I'll stop being a pain about this now, even though I still think getters are a stinky Java code smell and don't belong in Python anyways. ;-) |
|||
msg189994 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-05-25 19:18 | |
> This doesn't make any sense. Once you've exposed an API that gives > out a value for this, you can't change the implementation in a way > that doesn't involve handing out a value... in which case you can just > as easily set it as an attribute. Because it may be computed whenever the function call is done, rather than stored statically somewhere. The caching scheme will not necessarily remain forever identical. > If there > must be a way to change it later to be dynamic, it can always be > exposed as an attribute of an object that could grow a property later, > e.g. That's a possibility indeed. |
|||
msg190043 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-26 00:52 | |
The reason I switched from suggesting an attribute/property to a module level function is because we don't really support properties for process global state. That's why sys has so many getters in it - to make them properties, you would have to add a class object to act as a holder for the property. In this case, we appear to have a singleton that could hold the property (ABCMeta), which is why my original suggestion was to use an ABCMeta attribute. However, what I realised yesterday is that because ABCMeta is a type subclass, it means the only way to make the attribute a property in the future would be to add a metametaclass to hold the property definition. Once I realised that... "if the implementation is hard to explain, it's a bad idea" tipped me over the edge into preferring a simple module level getter function that supported exactly the use cases we care about :) |
|||
msg190050 - (view) | Author: Éric Araujo (eric.araujo) * | Date: 2013-05-26 02:56 | |
About future-proofing: do we want to document that the token is an opaque integer (as is done now) or just an opaque object that supports equality comparisons? |
|||
msg190062 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-26 04:27 | |
The latter is probably better (it should just need a slight tweak to the wording in the docs and docstring) |
|||
msg206912 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-12-24 21:14 | |
New changeset a9f73b44ea0e by R David Murray in branch 'default': #16832: s/integer/object/ in docs/docstring, and add whatsnew entry. http://hg.python.org/cpython/rev/a9f73b44ea0e |
|||
msg206913 - (view) | Author: R. David Murray (r.david.murray) * | Date: 2013-12-24 21:15 | |
I don't see that there's anything other than the doc change that needs done here, so I'm closing this. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:40 | admin | set | github: 61036 |
2013-12-24 21:15:47 | r.david.murray | set | status: open -> closed type: enhancement nosy: + r.david.murray messages: + msg206913 resolution: fixed stage: resolved |
2013-12-24 21:14:08 | python-dev | set | messages: + msg206912 |
2013-05-26 04:27:54 | ncoghlan | set | messages: + msg190062 |
2013-05-26 02:56:28 | eric.araujo | set | nosy:
+ eric.araujo messages: + msg190050 |
2013-05-26 00:52:26 | ncoghlan | set | messages: + msg190043 |
2013-05-25 19:18:12 | pitrou | set | messages: + msg189994 |
2013-05-25 19:16:27 | pje | set | messages: + msg189993 |
2013-05-25 19:02:55 | pje | set | messages: + msg189990 |
2013-05-25 17:13:29 | ncoghlan | set | messages: + msg189985 |
2013-05-25 16:56:13 | pitrou | set | messages: + msg189983 |
2013-05-25 16:54:28 | pje | set | nosy:
+ pje messages: + msg189982 |
2013-05-25 16:48:36 | python-dev | set | messages: + msg189981 |
2013-05-25 16:42:14 | python-dev | set | nosy:
+ python-dev messages: + msg189980 |
2013-05-25 16:28:18 | lukasz.langa | set | messages: + msg189978 |
2013-05-25 16:27:15 | lukasz.langa | set | keywords:
+ patch files: + issue16832.diff |
2013-05-25 16:24:03 | pitrou | set | messages: + msg189977 |
2013-05-25 16:20:41 | ncoghlan | set | messages: + msg189976 |
2013-05-25 15:57:48 | pitrou | set | messages: + msg189974 |
2013-05-25 15:49:35 | ncoghlan | set | messages: + msg189973 |
2013-05-25 15:48:32 | ncoghlan | set | messages: + msg189971 |
2013-05-25 15:27:04 | pitrou | set | nosy:
+ pitrou messages: + msg189966 |
2013-05-25 15:08:14 | lukasz.langa | set | assignee: lukasz.langa messages: + msg189964 nosy: + lukasz.langa |
2013-01-01 09:02:40 | daniel.urban | set | nosy:
+ daniel.urban |
2013-01-01 03:38:24 | ncoghlan | create |