Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type cache is not correctly invalidated on a class defining mro() #73052

Closed
sjpalt mannequin opened this issue Dec 4, 2016 · 22 comments
Closed

Type cache is not correctly invalidated on a class defining mro() #73052

sjpalt mannequin opened this issue Dec 4, 2016 · 22 comments
Labels
3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@sjpalt
Copy link
Mannequin

sjpalt mannequin commented Dec 4, 2016

BPO 28866
Nosy @terryjreedy, @vstinner, @encukou, @markshannon, @serhiy-storchaka, @Vgr255, @JulienPalard, @miss-islington
PRs
  • bpo-28866: Invalidate type cache on every setattr #13117
  • bpo-28866: No type cache for types with specialized mro, invalidation is hard. #13157
  • [3.7] bpo-28866: No type cache for types with specialized mro, invalidation is hard. (GH-13157) #13589
  • bpo-28866: Add regression test #17573
  • Files
  • weird.py: Steps to reproduce
  • weird_without_interactive.py
  • issue28866.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-12-04.14:35:45.681>
    labels = ['interpreter-core', '3.7', 'type-crash']
    title = 'Type cache is not correctly invalidated on a class defining mro()'
    updated_at = <Date 2019-12-11.12:36:42.450>
    user = 'https://bugs.python.org/sjpalt'

    bugs.python.org fields:

    activity = <Date 2019-12-11.12:36:42.450>
    actor = 'petr.viktorin'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Interpreter Core']
    creation = <Date 2016-12-04.14:35:45.681>
    creator = 'sjpalt'
    dependencies = []
    files = ['45752', '45756', '45766']
    hgrepos = []
    issue_num = 28866
    keywords = ['patch']
    message_count = 22.0
    messages = ['282344', '282345', '282349', '282353', '282368', '282375', '282430', '282468', '282485', '282486', '282598', '288072', '341581', '343550', '343581', '343582', '343583', '343585', '343650', '355377', '358257', '358258']
    nosy_count = 9.0
    nosy_names = ['terry.reedy', 'vstinner', 'petr.viktorin', 'Mark.Shannon', 'serhiy.storchaka', 'abarry', 'mdk', 'sjpalt', 'miss-islington']
    pr_nums = ['13117', '13157', '13589', '17573']
    priority = 'high'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue28866'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7']

    @sjpalt
    Copy link
    Mannequin Author

    sjpalt mannequin commented Dec 4, 2016

    I ran into a strange bug while experimenting with metaclasses that implement a custom mro() method. It only seems to occur in interactive mode, either returning completely unrelated values or causing a segfault. Python 2 appears unaffected. I have been able to reproduce this with the following code:

    # $ python -i weird.py
    # >>> proxy.x
    # 52011448
    # >>> proxy.x
    # 6160
    # ...
    class Foo:
        pass
    
    class Meta(type):
        def mro(cls):
            return (cls, Foo, object)
    
        def __setattr__(cls, name, value):
            setattr(Foo, name, value)
    
    proxy = Meta('FooProxy', (), {})
    
    proxy.x = 300
    proxy.x  # don't omit
    proxy.x = 0

    @sjpalt sjpalt mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 4, 2016
    @serhiy-storchaka
    Copy link
    Member

    Python 2 is affected if make Foo a new style class.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Dec 4, 2016
    @Vgr255
    Copy link
    Mannequin

    Vgr255 mannequin commented Dec 4, 2016

    Additionally, trying to access an attribute before assigning it will result in an AttributeError on all subsequent accesses (even if set).

    I didn't manage to get a segfault, however.

    @JulienPalard
    Copy link
    Member

    FWIW, in _PyType_Lookup I see the "300" PyLong being cached, and later, just before the segfault, I see its address getting out of the cache (a cache hit) but it's no longer a PyLong, it's scrambled, so we're getting a real pointer with scrambled values on the line:

        attribute = _PyType_Lookup(type, name);

    segfaulting two lines later in:

    descrgetfunc local_get = Py_TYPE(attribute)->tp_descr_get
    

    When I write scrambled value I mean:

    (gdb) p *attribute                                                                                                         
    $21 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = -2604246222170760229, ob_type = 0xdbdbdbd\
    bdbdbdbdb}                                                                                                                 
    

    To debug interactive session in GDB I used:

    r -i weird.py < stdin
    

    with "proxy.x" in the stdin file.

    @JulienPalard
    Copy link
    Member

    I'm able to reproduce the segmentation fault outside the interactive mode, see attached file.

    @JulienPalard
    Copy link
    Member

    Problem looks mainly due to the __setattr__ done on a different type than the getattr, and the cache of PyType_Lookup:

    • type_setattro will be called with Foo, "x", and the value to set
    • type_getattro will be called with FooProxy, "x"

    The timeline (with my weird_without_interactive.py) is:

    • type_setattro is called on Foo for attr "x" with the instance of Bar
    • type_getattro is called on FooProxy for attr "x" (the proxy.x line)
      During this type_getattro, the instance of Bar is cached in PyType_Lookup for (FooProxy, "x")
    • type_setattro is called on Foo for attr "x" with the value 0
      During this call, insertdict will legitimately call DECREF on the Bar instance, which is deleted
    • type_getattro is called on FooProxy with attr "x" for the print()
      During this call, PyType_Lookup will cache hit for (FooProxy, x) with the old reference to the instance of Bar

    @vstinner
    Copy link
    Member

    vstinner commented Dec 5, 2016

    If you call sys._clear_type_cache() after the setattr(), weird_without_interactive.py doesn't crash anymore.

    @JulienPalard
    Copy link
    Member

    I found a way to fix it, but as I'm just discovering cpython internals, I'm not sure it's the right way, review are very welcome.

    I fixed it this way because:

    The bytecode generated for "proxy.x = 0" (a STORE_ATTR) will call a
    PyObject_SetAttr with FooProxy which will soon execute the __setattr__ of Meta, itself calling a PyObject_SetAttr with Foo.

    The actual attribute setting is done from the PyObject_SetAttr with Foo (calling in turn type_setattro, and so on), but it's already too late to invalidate the FooProxy type: we no longer have a reference on it and can't guess that FooProxy delegated __setattr__ to Foo.

    So the only place when we can invalidate correctly is in the first call of PyObject_SetAttr when we know on which type the attribute will be set, and it make sense: It does not matter what a __setattr__ does and how it does it: invalidation should happen for this attribute on the current type, so invalidating here seems logic.

    I did not yet took the time to measure performance loss induced with this patch.

    With the patch:

    ./python -i weird.py
    >>> proxy.x
    0
    >>> proxy.x
    0

    @vstinner
    Copy link
    Member

    vstinner commented Dec 5, 2016

    The issue is that the Meta class has a reference to the class Foo in its mro() method, but Foo is not aware of Meta. So when Foo is modified, the Foo cache is invalidated, but not Meta cache.

    bpo-28866.diff always invalidates the cache, so it works. But it is suboptimal, IMO it defeats the whole purpose of a cache.

    I never defined a mro() method. I'm not sure that it's possible to have a type cache and a mro() method?

    Options:

    • Disable completely the cache on classes defining mro()
    • Modify "Meta" (the C code implementing the type, not the Python code) to track the version tag of each class referenced by mro(). Problem: mro() is dynamic!?
    • Somehow, notify Foo that Meta has a reference to its cache, so Foo is able to invalidate Meta cache

    @vstinner vstinner changed the title Unexpected behavior resulting from mro() and __setattr__ in interactive mode Type cache is not correctly invalidated on a class defining mro() Dec 5, 2016
    @vstinner
    Copy link
    Member

    vstinner commented Dec 5, 2016

    • Somehow, notify Foo that Meta has a reference to its cache, so Foo is able to invalidate Meta cache

    This was implemented when the type cache was implemented in Python 2.6, but only for explicit subclasses. PyType_Modified() iterates on tp_subclasses.

    PyType_Ready() updates tp_subclasses: it stores a weak reference to sub classes in each base class.

    I understand that, if we want to implement this feature, type_mro_modified() should be modified to add a backward reference in each base class of the MRO. type_mro_modified() is called when a type is defined, but also when type.__bases__ is explicitly modified.

    It would require to add a new slot to types, and so increase a little bit the memory usage, and slow down the creation of a type, and type.__bases__ (slow down: probably negligible, O(1) since the existing tp_subclasses uses a dict).

    @JulienPalard
    Copy link
    Member

    bpo-28866.diff always invalidates the cache, so it works. But it is suboptimal, IMO it defeats the whole purpose of a cache.

    Not sure about defeating the purpose of the cache as I only invalidate in setattr, getattr are still cache hitting. I tried:

    $ python3 -m performance run --python=./python --rigorous -b all -o master.json
    $ git checkout issue28866; make -j 8
    $ python3 -m performance run --python=./python --rigorous -b all -o issue28866.json
    $ python3 -m performance compare master.json issue28866.json

    And I don't see much differences, probably only noise as I get some faster tests:

    ### call_method_unknown ###
    Median +- Std dev: 56.8 ms +- 3.3 ms -> 52.9 ms +- 1.6 ms: 1.08x faster
    Significant (t=12.92)

    ### pybench.IfThenElse ###
    Median +- Std dev: 247 ns +- 3 ns -> 224 ns +- 16 ns: 1.11x faster
    Significant (t=15.15)

    and some slower:
    ### pybench.StringPredicates ###
    Median +- Std dev: 1.89 us +- 0.05 us -> 2.18 us +- 0.21 us: 1.15x slower
    Significant (t=-15.20)

    ### unpack_sequence ###
    Median +- Std dev: 207 ns +- 4 ns -> 231 ns +- 27 ns: 1.12x slower
    Significant (t=-11.63)

    I'm not yet accustomed to this perf suite, so I may miss something obvious. I'll ran it again on master to measure the noise and should probably fine-tune my system for stability.

    I'll also try a benchmark without the cache for comparison.

    @JulienPalard
    Copy link
    Member

    Hi,

    Tried again, this time getting some stats with MCACHE_STATS 1, to check if my patch is defeating the cache:

    Without my patch:

    $ time ./python performance/benchmarks/bm_chaos.py --worker -l1 -w0 -n1 --filename chaos.ppm --width=512 --height=512 --iterations 50000
    chaos: Median: 2.51 sec
    -- Method cache hits        = 16581735 (99%)
    -- Method cache true misses = 4092 (0%)
    -- Method cache collisions  = 28542 (0%)
    -- Method cache size        = 96 KB

    With my patch:

    $ time ./python performance/benchmarks/bm_chaos.py --worker -l1 -w0 -n1 --filename chaos.ppm --width=512 --height=512 --iterations 50000
    chaos: Median: 2.53 sec
    -- Method cache hits        = 16582260 (99%)
    -- Method cache true misses = 4096 (0%)
    -- Method cache collisions  = 28012 (0%)
    -- Method cache size        = 96 KB

    @markshannon
    Copy link
    Member

    This failure appears to be a symptom of recursively traversing __bases__ rather scanning __mro__ in the implementation of type.__subclasses__
    The MCACHE depends on type.__subclasses__ being correct and it is not correct for weird.py

    python -i weird.py
    >>> C = Meta("C", (), {})
    >>> C.__mro__
    (<class '__main__.C'>, <class '__main__.Foo'>, <class 'object'>)
    >>> Foo.__subclasses__()
    []
    >>> C.__bases__
    (<class 'object'>,)

    Fixing this may need a change in the API for type.__subclasses__() to return all subclasses, as defined by __mro__, not just the bases.

    A simpler, temporary fix might be to set Py_TPFLAGS_HAVE_VERSION_TAG to 0 for any class that has a custom mro()

    @markshannon
    Copy link
    Member

    New changeset 180dc1b by Mark Shannon (Julien Palard) in branch 'master':
    bpo-28866: No type cache for types with specialized mro, invalidation is hard. (bpo-13157)
    180dc1b

    @vstinner
    Copy link
    Member

    Well done Julien 😉

    @vstinner
    Copy link
    Member

    Python 2.7 and 3 7 should be fixed as well, no?

    @JulienPalard
    Copy link
    Member

    I think I can backport it to 3.7, and I let you choose if 2.7 need this.

    @miss-islington
    Copy link
    Contributor

    New changeset bfd0b77 by Miss Islington (bot) in branch '3.7':
    [3.7] bpo-28866: No type cache for types with specialized mro, invalidation is hard. (GH-13157) (GH-13589)
    bfd0b77

    @vstinner
    Copy link
    Member

    Python 2.7 is also affected according to Serhiy. Does someone want to write a backport to 2.7?

    @terryjreedy
    Copy link
    Member

    bpo-38541 reports a severe attribute access slowdown in 3.7.4 in some situations. (Fixed by the enhancement in bpo-36922, which I presume cannot be backported.)

    @encukou
    Copy link
    Member

    encukou commented Dec 11, 2019

    A refcount problem possibly caused by the fix: https://bugs.python.org/issue39016

    @encukou
    Copy link
    Member

    encukou commented Dec 11, 2019

    Steve, I would like to add your reproducer (weird.py) to CPython's test suite, to make sure this doesn't happen again.

    Would you be willing to sign the contributor agreement, so we can use your code in Python?

    The instructions are here: https://www.python.org/psf/contrib/contrib-form/

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @encukou encukou closed this as completed May 3, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants