This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: dict subclassing is slow
Type: performance Stage:
Components: C API Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Kevin Shweh, Marco Sulla, gvanrossum, jack__d, willingc
Priority: normal Keywords:

Created on 2021-08-15 20:10 by Marco Sulla, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg399625 - (view) Author: Marco Sulla (Marco Sulla) * Date: 2021-08-15 20:10
I asked on SO why subclassing dict makes the subclass much slower in some operations. This is the answer by Monica (https://stackoverflow.com/a/59914459/1763602):

Indexing and in are slower in dict subclasses because of a bad interaction between a dict optimization and the logic subclasses use to inherit C slots. This should be fixable, though not from your end.

The CPython implementation has two sets of hooks for operator overloads. There are Python-level methods like __contains__ and __getitem__, but there's also a separate set of slots for C function pointers in the memory layout of a type object. Usually, either the Python method will be a wrapper around the C implementation, or the C slot will contain a function that searches for and calls the Python method. It's more efficient for the C slot to implement the operation directly, as the C slot is what Python actually accesses.

Mappings written in C implement the C slots sq_contains and mp_subscript to provide in and indexing. Ordinarily, the Python-level __contains__ and __getitem__ methods would be automatically generated as wrappers around the C functions, but the dict class has explicit implementations of __contains__ and __getitem__, because the explicit implementations (https://github.com/python/cpython/blob/v3.8.1/Objects/dictobject.c) are a bit faster than the generated wrappers:

static PyMethodDef mapp_methods[] = {
    DICT___CONTAINS___METHODDEF
    {"__getitem__", (PyCFunction)(void(*)(void))dict_subscript,        METH_O | METH_COEXIST,
     getitem__doc__},
    ...

(Actually, the explicit __getitem__ implementation is the same function as the mp_subscript implementation, just with a different kind of wrapper.)

Ordinarily, a subclass would inherit its parent's implementations of C-level hooks like sq_contains and mp_subscript, and the subclass would be just as fast as the superclass. However, the logic in update_one_slot (https://github.com/python/cpython/blob/v3.8.1/Objects/typeobject.c#L7202) looks for the parent implementation by trying to find the generated wrapper methods through an MRO search.

dict doesn't have generated wrappers for sq_contains and mp_subscript, because it provides explicit __contains__ and __getitem__ implementations.

Instead of inheriting sq_contains and mp_subscript, update_one_slot ends up giving the subclass sq_contains and mp_subscript implementations that perform an MRO search for __contains__ and __getitem__ and call those. This is much less efficient than inheriting the C slots directly.

Fixing this will require changes to the update_one_slot implementation.

Aside from what I described above, dict_subscript also looks up __missing__ for dict subclasses, so fixing the slot inheritance issue won't make subclasses completely on par with dict itself for lookup speed, but it should get them a lot closer.

As for pickling, on the dumps side, the pickle implementation has a dedicated fast path (https://github.com/python/cpython/blob/v3.8.1/Modules/_pickle.c#L4291) for dicts, while the dict subclass takes a more roundabout path through object.__reduce_ex__ and save_reduce.

On the loads side, the time difference is mostly just from the extra opcodes and lookups to retrieve and instantiate the __main__.A class, while dicts have a dedicated pickle opcode for making a new dict. If we compare the disassembly for the pickles:

In [26]: pickletools.dis(pickle.dumps({0: 0, 1: 1, 2: 2, 3: 3, 4: 4}))                                                                                                                                                           
    0: \x80 PROTO      4
    2: \x95 FRAME      25
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: (    MARK
   14: K        BININT1    0
   16: K        BININT1    0
   18: K        BININT1    1
   20: K        BININT1    1
   22: K        BININT1    2
   24: K        BININT1    2
   26: K        BININT1    3
   28: K        BININT1    3
   30: K        BININT1    4
   32: K        BININT1    4
   34: u        SETITEMS   (MARK at 13)
   35: .    STOP
highest protocol among opcodes = 4

In [27]: pickletools.dis(pickle.dumps(A({0: 0, 1: 1, 2: 2, 3: 3, 4: 4})))                                                                                                                                                        
    0: \x80 PROTO      4
    2: \x95 FRAME      43
   11: \x8c SHORT_BINUNICODE '__main__'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'A'
   25: \x94 MEMOIZE    (as 1)
   26: \x93 STACK_GLOBAL
   27: \x94 MEMOIZE    (as 2)
   28: )    EMPTY_TUPLE
   29: \x81 NEWOBJ
   30: \x94 MEMOIZE    (as 3)
   31: (    MARK
   32: K        BININT1    0
   34: K        BININT1    0
   36: K        BININT1    1
   38: K        BININT1    1
   40: K        BININT1    2
   42: K        BININT1    2
   44: K        BININT1    3
   46: K        BININT1    3
   48: K        BININT1    4
   50: K        BININT1    4
   52: u        SETITEMS   (MARK at 31)
   53: .    STOP
highest protocol among opcodes = 4

we see that the difference between the two is that the second pickle needs a whole bunch of opcodes to look up __main__.A and instantiate it, while the first pickle just does EMPTY_DICT to get an empty dict. After that, both pickles push the same keys and values onto the pickle operand stack and run SETITEMS
msg399632 - (view) Author: Jack DeVries (jack__d) * Date: 2021-08-16 01:36
There was a thorough discussion about the concerns associated with supporting dict subclasses in general here: bpo-32615

If I understand correctly, allowing dict subclasses to inherit __contains__ and __getitem__ will be a step towards supporting dict subclasses in a more broad context, like being able to use them as namespaces passed to exec. 

Ultimately, dictionaries are at the very heart of Cpython's implementation. I think it is reasonable to say that it is not possible or reasonable to support users in creating custom dict implantation with their own behavior. There is no good real world use case for it (please tell me if anyone has one), and it opens the door to enormous unnecessary complexity. 

Also related to dict subclassing:
bpo-15099
bpo-27561

Marco, I hope that the discussions I've linked will at least make it clear why cpython behaves this way, and what the concerns are especially around supporting user subclasses of dict. I noticed you tagged this bpo as performance, but there are significant negative performance implications around dict subclasses that you'll see in past discussions.

With this additional context in mind, what changes do you propose?
msg399790 - (view) Author: Marco Sulla (Marco Sulla) * Date: 2021-08-17 20:51
Since my knowledge of this is very poor, I informed Monica about the issue. I'm quite sure that if there's a way to turn lemons into lemonade :)
msg399791 - (view) Author: Marco Sulla (Marco Sulla) * Date: 2021-08-17 20:53
.... I not finished my phrase. I'm sure that if there's a way to turn lemons into lemonade, she is **MUCH** more skilled than me to find one.
msg399797 - (view) Author: Kevin Shweh (Kevin Shweh) Date: 2021-08-17 21:21
Of course it's reasonable to support dict subclasses. We already have a bunch of dict subclasses in the standard library, like collections.defaultdict and collections.Counter, and collections.Counter is significantly slower than it could be because of this issue. (collections.defaultdict seems to be unaffected due to differences between classes implemented in C and Python.) dict.__getitem__ even has dedicated support for a __missing__ hook, which is only useful for subclasses.
msg399922 - (view) Author: Marco Sulla (Marco Sulla) * Date: 2021-08-19 16:21
Since probably Monica are taking her holidays, I try to decipher her answer.

Probably, the more problematic function spotted by Monica is update_one_slot. I re-quote her sentence:

update_one_slot looks for the parent implementation by trying to find the generated wrapper methods through an MRO search.

dict doesn't have generated wrappers for sq_contains and mp_subscript, because it provides explicit __contains__ and __getitem__ implementations.

Instead of inheriting sq_contains and mp_subscript, update_one_slot ends up giving the subclass sq_contains and mp_subscript implementations that perform an MRO search for __contains__ and __getitem__ and call those. This is much less efficient than inheriting the C slots directly.

The solution for Monica is to change the behaviour of update_one_slot for these cases (no wrappers, C slots directly).

I don't know the implications of this change...
History
Date User Action Args
2022-04-11 14:59:48adminsetgithub: 89084
2021-08-19 16:21:29Marco Sullasetmessages: + msg399922
2021-08-19 04:26:31gvanrossumsetnosy: + gvanrossum
2021-08-17 21:21:42Kevin Shwehsetnosy: + Kevin Shweh
messages: + msg399797
2021-08-17 20:53:16Marco Sullasetmessages: + msg399791
2021-08-17 20:51:15Marco Sullasetmessages: + msg399790
2021-08-16 17:14:25willingcsetnosy: + willingc
2021-08-16 01:36:09jack__dsetnosy: + jack__d
messages: + msg399632
2021-08-15 20:10:26Marco Sullacreate