Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose cache validity checking support in ABCMeta #61036

Closed
ncoghlan opened this issue Jan 1, 2013 · 22 comments
Closed

Expose cache validity checking support in ABCMeta #61036

ncoghlan opened this issue Jan 1, 2013 · 22 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

ncoghlan commented Jan 1, 2013

BPO 16832
Nosy @pjeby, @ncoghlan, @pitrou, @merwok, @bitdancer, @durban, @ambv
Files
  • issue16832.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ambv'
    closed_at = <Date 2013-12-24.21:15:47.081>
    created_at = <Date 2013-01-01.03:38:24.497>
    labels = ['type-feature', 'library']
    title = 'Expose cache validity checking support in ABCMeta'
    updated_at = <Date 2013-12-24.21:15:47.079>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2013-12-24.21:15:47.079>
    actor = 'r.david.murray'
    assignee = 'lukasz.langa'
    closed = True
    closed_date = <Date 2013-12-24.21:15:47.081>
    closer = 'r.david.murray'
    components = ['Library (Lib)']
    creation = <Date 2013-01-01.03:38:24.497>
    creator = 'ncoghlan'
    dependencies = []
    files = ['30366']
    hgrepos = []
    issue_num = 16832
    keywords = ['patch']
    message_count = 22.0
    messages = ['178725', '189964', '189966', '189971', '189973', '189974', '189976', '189977', '189978', '189980', '189981', '189982', '189983', '189985', '189990', '189993', '189994', '190043', '190050', '190062', '206912', '206913']
    nosy_count = 8.0
    nosy_names = ['pje', 'ncoghlan', 'pitrou', 'eric.araujo', 'r.david.murray', 'daniel.urban', 'lukasz.langa', 'python-dev']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue16832'
    versions = ['Python 3.4']

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Jan 1, 2013

    ABCMeta uses an internal counter to invalidate the negative caches when a register() call changes the virtual inheritance graph. (A global count of register() calls is stored on ABCMeta, which ABCMeta.__instancecheck__ and ABCMeta.__subclasscheck__ then compare to a copy cached on the individual ABC)

    To properly handle register() method calls on ABCs, generic function implementations also need a way to invalidate their own caches when that graph changes.

    It seems like the simplest way to handle this would be to expose a read-only "ABCMeta.cache_token" property. Generic function implementations could then work the same way as ABCMeta itself: store explicit implementation registrations and a cache of derived implementations separately, and if a saved copy of the cache token doesn't match the current value of ABCMeta.cache_token, clear the derived cache.

    @ncoghlan ncoghlan added the stdlib Python modules in the Lib dir label Jan 1, 2013
    @ambv
    Copy link
    Contributor

    ambv commented May 25, 2013

    This will be useful for PEP-443.

    From Nick's comment on python-dev (http://mail.python.org/pipermail/python-dev/2013-May/126535.html):

    "Given the global nature of the cache invalidation, it may be better as
    a module level abc.get_cache_token() function."

    @ambv ambv self-assigned this May 25, 2013
    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2013

    Rather than exposing the "cache token" (which looks like an implementation detail), you may allow third-party code to register a handler which will be called when an ABC's registrations are modified:

    def abc_handler(abc):
        """
        Called when the concrete class registrations for ABC `abc`
        are updated.
        """

    or even:

    def abc_handler(abc, added, removed):
        """
        Called when the concrete class registrations for ABC `abc`
        are updated. `added` is an iterable of concrete classes which
        have been registered on the ABC, `removed` is an iterable of
        concrete classes which have been unregistered.
        """

    @ncoghlan
    Copy link
    Contributor Author

    I thought about that originally, but there's only ever one object graph for the process, and as soon as you break any one edge in that graph you pretty much invalidate anything based on caching traversal results. (More accurately: it's almost always going to be cheaper to blow away and rebuild your cache than it is to determine whether or not your cache was actually affected by the graph change, particularly given the fact that object graph changes will almost always happen while the caches are still cold during application startup).

    So it's not really an implementation detail, it's a promise to provide a unique token for the current state of the object graph that can be used reliably for cache invalidation.

    Thus, I favour exposing a cache token as the simplest thing that could possibly work - building a mechanism for "tell me when the object graph changes" is a complex solution when the only question we need to answer is "is my cache still valid?".

    So the recommended idiom with this design would be:

        new_token = abc.get_cache_token()
        if new_token != cached_token:
            cache.clear()
            cached_token = new_token
        else:
            # Do something with the cache
        # Handle a cache miss

    A callback based system is actually *harder* to use, because you can't simply ask the question "Is my cache still valid?" - you have to register a callback that sets a flag, or something similar. Your lookup code ends up being:

        if cache_invalidated:
            cache.clear()
            cache_invalidated = False
        else:
            # Do something with the cache
        # Handle a cache miss

    However, now you have a potential problem, because your state update (setting the flag) is decoupled from checking the flag. That makes it harder to ensure correct sequencing than is the case with a simple inline check of a cache validity token.

    Choosing a solution that is harder to implement and harder to use is definitely not a good idea :)

    @ncoghlan
    Copy link
    Contributor Author

    And when I say "originally" I mean "after I saw how ABCs already implemented this capability" :)

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2013

    I thought about that originally, but there's only ever one object
    graph for the process, and as soon as you break any one edge in that
    graph you pretty much invalidate anything based on caching traversal
    results.

    Ah, I was forgetting that a ABC registration also affects the subclasses
    of the registered class, and the abstract parents of the ABC class.
    So it's a bit more annoying than it looked like.

    So it's not really an implementation detail, it's a promise to provide
    a unique token for the current state of the object graph that can be
    used reliably for cache invalidation.

    Sounds fair.

    A callback based system is actually *harder* to use, because you can't
    simply ask the question "Is my cache still valid?" - you have to
    register a callback that sets a flag, or something similar. Your
    lookup code ends up being:

    Well, the point of a callback based system is for the callback to
    invalidate or recompute the cache *right now*, so you don't have to
    "check a flag" each time you do a lookup.

    It's a basic cost/benefit compromise: the callback system makes
    invalidations more expensive, but lookups cheapers.

    Besides, a callback-based system can be more future-proof than exposing
    the raw cache tags or tokens.

    @ncoghlan
    Copy link
    Contributor Author

    Ah, but that's the other trick: we *don't know* if we need to recalculate our whole cache when the object graph changes.

    1. Some of the cached entries may never be accessed again, so recalculating them will be a waste
    2. The object graph may change again before they're next accessed, so recalculating any entries at all will be waste

    The cache invalidation design in the ABCs is a smart compromise, since it leaves recreating the cache entries to the last possible moment: the next time a type is looked up after the object graph has changed. It doesn't matter if there is one graph change or a thousand in that time, we only do the fresh lookup once. And if the type is never looked up again, we don't even need to do that much.

    In a typical application, the expected usage model for both ABCs and generic functions is the same:

    1. During startup, the object graph and dispatch resolution results are changing rapidly as concrete classes and implementations get registered with ABCs and generic functions. Actual instance checks and generic function dispatch operations are rare, so the caches mostly remain cold.
    2. Once the application is up and running, the object graph stabilises, and the caches of significance to the application start to become populated through actual use. Absent live code reloading, the caches then remain valid for the lifetime of the application.

    The ABC design takes the view that the extra runtime check for cache validity in the absence of direct inheritance or registration is worth it to avoid repeated cache clearing during application startup for ABCs that may never even be used by the application.

    I believe exactly the same reasoning holds true for generic functions: triggering callbacks when the object graph changes means we may end up doing a lot of work clearing caches that the application isn't even using. Delaying the check to call time means that only the caches that matter will ever be touched.

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2013

    1. Some of the cached entries may never be accessed again, so
      recalculating them will be a waste
    2. The object graph may change again before they're next accessed, so
      recalculating any entries at all will be waste

    Yup, hence the "cost/benefit compromise" ;-)

    My assumption here was that ABC registrations change quite infrequently
    compared to the invocations of generic functions, so transferring most
    of the cost to registration-time (rather than invocation-time) would be
    a benefit.

    @ambv
    Copy link
    Contributor

    ambv commented May 25, 2013

    Review the patch, please. Committing in 10... 9... 8...

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 25, 2013

    New changeset 5b17d5ca2ff1 by Łukasz Langa in branch 'default':
    Fix bpo-16832 - expose cache validity checking support in ABCMeta
    http://hg.python.org/cpython/rev/5b17d5ca2ff1

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented May 25, 2013

    New changeset d9828c438889 by Łukasz Langa in branch 'default':
    Mention issue bpo-16832 in Misc/NEWS
    http://hg.python.org/cpython/rev/d9828c438889

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented May 25, 2013

    Please expose this as an attribute of the class or module, not as a function. A function is orders of magnitude slower than attribute access, and the entire point of exposing this is to allow caches to be invalidated.

    In order to be useful for cache invalidation, the token has to be read on *every* access to a cache, thus adding unnecessary overhead to every cache access.

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2013

    Please expose this as an attribute of the class or module, not as a
    function. A function is orders of magnitude slower than attribute
    access, and the entire point of exposing this is to allow caches to be
    invalidated.

    -1. Exposing a function allows to modify the underlying implementation
    without breaking any API.

    @ncoghlan
    Copy link
    Contributor Author

    Trading correctness for speed is almost never a good idea. If people are worried about speed to that level, they can either bypass the public API and access the private attribute directly (after profiling their application to ensure the cache validity checks are the bottleneck), or else migrate to PyPy to eliminate the function call overhead (amongst many other speed improvements).

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented May 25, 2013

    Antoine Pitrou added the comment:

    -1. Exposing a function allows to modify the underlying implementation
    without breaking any API.

    This doesn't make any sense. Once you've exposed an API that gives
    out a value for this, you can't change the implementation in a way
    that doesn't involve handing out a value... in which case you can just
    as easily set it as an attribute.

    So there is actually zero improvement in encapsulation: it's purely a
    ritualistic wrapping, like Java programmers insisting on having
    getFoo() methods in Python when an attribute would suffice. If there
    must be a way to change it later to be dynamic, it can always be
    exposed as an attribute of an object that could grow a property later,
    e.g.

        from abc import object_graph
    
        if object_graph.version != old_version:
            ...

    Nick Coghlan added the comment:

    Trading correctness for speed is almost never a good idea.

    How is "correctness" relevant to the question of whether a readable
    value is exposed as an attribute or a method?

    @pjeby
    Copy link
    Mannequin

    pjeby mannequin commented May 25, 2013

    All that being said, I took out some time to get actual numbers, and
    found my original guesstimate of overhead was incorrect; it's only 3
    times slower, not "orders of magnitude". ;-)

    And even on a relatively old machine, that 3 times slower amounts to
    an extra tenth of a microsecond, at least in a hot loop with timeit.
    So I'll stop being a pain about this now, even though I still think
    getters are a stinky Java code smell and don't belong in Python
    anyways. ;-)

    @pitrou
    Copy link
    Member

    pitrou commented May 25, 2013

    This doesn't make any sense. Once you've exposed an API that gives
    out a value for this, you can't change the implementation in a way
    that doesn't involve handing out a value... in which case you can just
    as easily set it as an attribute.

    Because it may be computed whenever the function call is done, rather
    than stored statically somewhere. The caching scheme will not
    necessarily remain forever identical.

    If there
    must be a way to change it later to be dynamic, it can always be
    exposed as an attribute of an object that could grow a property later,
    e.g.

    That's a possibility indeed.

    @ncoghlan
    Copy link
    Contributor Author

    The reason I switched from suggesting an attribute/property to a module level function is because we don't really support properties for process global state. That's why sys has so many getters in it - to make them properties, you would have to add a class object to act as a holder for the property.

    In this case, we appear to have a singleton that could hold the property (ABCMeta), which is why my original suggestion was to use an ABCMeta attribute. However, what I realised yesterday is that because ABCMeta is a type subclass, it means the only way to make the attribute a property in the future would be to add a metametaclass to hold the property definition.

    Once I realised that... "if the implementation is hard to explain, it's a bad idea" tipped me over the edge into preferring a simple module level getter function that supported exactly the use cases we care about :)

    @merwok
    Copy link
    Member

    merwok commented May 26, 2013

    About future-proofing: do we want to document that the token is an opaque integer (as is done now) or just an opaque object that supports equality comparisons?

    @ncoghlan
    Copy link
    Contributor Author

    The latter is probably better (it should just need a slight tweak to
    the wording in the docs and docstring)

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 24, 2013

    New changeset a9f73b44ea0e by R David Murray in branch 'default':
    bpo-16832: s/integer/object/ in docs/docstring, and add whatsnew entry.
    http://hg.python.org/cpython/rev/a9f73b44ea0e

    @bitdancer
    Copy link
    Member

    I don't see that there's anything other than the doc change that needs done here, so I'm closing this.

    @bitdancer bitdancer added the type-feature A feature request or enhancement label Dec 24, 2013
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants