Issue34396
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2018-08-13 19:36 by bup, last changed 2022-04-11 14:59 by admin.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 14863 | closed | jdemeyer, 2019-07-19 15:54 |
Messages (8) | |||
---|---|---|---|
msg323490 - (view) | Author: Dan Snider (bup) * | Date: 2018-08-13 19:36 | |
The ones I know of are list.__getitem__, dict __contains__ & __getitem__, and (frozen)set.__contains__ but from what I can tell so far it seems like dict.__getitem__ takes the worst hit. For dicts, I've spent a few hours trying to figure out what's getting put into the heap type's mp_subscript slot to slow it down so drastically but I just can't figure it out. So I just forced `dict_subscript` into the heap type's mp_subscript slot to confirm the rather obvious impossibility of it breaking anything. Here's a "little" timeit script to demonstrate how horribly inefficient `dict.__getitem__` is when called on anything other than an extension type: if __name__ == '__main__': import timeit from collections import OrderedDict as FastDictSub from ctypes import * class mappingmethods(Structure): _fields_ = [ ('mp_length', c_void_p), ('mp_subscript', PYFUNCTYPE(py_object,py_object,py_object)), ('mp_ass_subscript', c_void_p)] class _type_object(Structure): _fields_ = [('junk', (c_uint8*56)), # <- brevity ('tp_as_mapping', POINTER(mappingmethods))] py_type = POINTER(_type_object) class SlowDictSub(dict): pass assert SlowDictSub.__base__ is dict and FastDictSub.__base__ is dict assert SlowDictSub.__getitem__ is dict.__getitem__ assert FastDictSub.__getitem__ is dict.__getitem__ mp = dict.fromkeys(range(15)) setup = 'from __main__ import mp, %s as Dict; d=Dict(mp)' print(f'Comparing execution time of heap allocated dict subtype ' f'versus PyODict_Type (who also inherits dict.__getitem__)...\n') slown, slowt = timeit.Timer('d[10]', setup%'SlowDictSub').autorange() print(f'avg. exec {slown/slowt:_.0f} SlowDictSub[x] statements per second') fastn, fastt = timeit.Timer('d[10]', setup%'FastDictSub').autorange() print(f'avg. exec {fastn/fastt:_.0f} OrderedDict[x] statements per second') print() print(f'SlowDictSub was {1/(fastt/slowt):.2f}x slower than OrderedDict... ' f"Let's see what happens when we fix SlowDictSub's 'broken' " "mp_subscript slot:\n") slow_tp = cast(id(SlowDictSub), py_type).contents fast_tp = cast(id(FastDictSub), py_type).contents slow_tp.tp_as_mapping.contents.mp_subscript = ( fast_tp.tp_as_mapping.contents.mp_subscript) postn, postt = timeit.Timer('d[10]', setup%'SlowDictSub').autorange() print(f'avg. exec {postn/postt:_.0f} SlowDictSub[x] after slot change') print(f'which is now {1/(fastt/postt):.2}x the speed of dict_item') Comparing execution time of heap allocated dict subtype versus PyODict_Type (who also inherits dict.__getitem__)... avg. exec 6_220_709 SlowDictSub[x] statements per second avg. exec 21_894_971 OrderedDict[x] statements per second SlowDictSub was 3.52x slower than OrderedDict... Let's see what happens when we fix SlowDictSub's 'broken' mp_subscript slot: avg. exec 21_665_595 SlowDictSub[x] after slot change which is now 1.0x the speed of dict_item |
|||
msg323498 - (view) | Author: Benjamin Peterson (benjamin.peterson) * | Date: 2018-08-14 04:40 | |
Amusingly, this is because of an old hack to make directly calling somedict.__getitem__ fast: https://github.com/python/cpython/commit/8f5cdaa784f555149adf5e94fd2e989f99d6b1db Reverting the dictobject.c changes of the commit, i.e., the following patch, fixes your example. diff --git a/Objects/dictobject.c b/Objects/dictobject.c index 413557d667..d85834977d 100644 --- a/Objects/dictobject.c +++ b/Objects/dictobject.c @@ -3032,8 +3032,6 @@ dict_sizeof(PyDictObject *mp, PyObject *Py_UNUSED(ignored)) return PyLong_FromSsize_t(_PyDict_SizeOf(mp)); } -PyDoc_STRVAR(getitem__doc__, "x.__getitem__(y) <==> x[y]"); - PyDoc_STRVAR(sizeof__doc__, "D.__sizeof__() -> size of D in memory, in bytes"); @@ -3071,8 +3069,6 @@ PyDoc_STRVAR(values__doc__, static PyMethodDef mapp_methods[] = { DICT___CONTAINS___METHODDEF - {"__getitem__", (PyCFunction)dict_subscript, METH_O | METH_COEXIST, - getitem__doc__}, {"__sizeof__", (PyCFunction)dict_sizeof, METH_NOARGS, sizeof__doc__}, DICT_GET_METHODDEF |
|||
msg328187 - (view) | Author: Dan Snider (bup) * | Date: 2018-10-21 04:09 | |
Well, I found another mystery. Calling tuple.__getitem__ directly, on an actual tuple instance, is 50-80% slower than calling list.__getitem__ even though in theory, they should be virtually identical with maybe tuple access being infinitesimally be faster. The only difference I found here between the two __getitem__s surprisingly is that tuple *doesn't* have __getitem__ in its method list, while list does. >>> [].__getitem__ <built-in method __getitem__ of list object at 0x012EE2D8> >>> ().__getitem__ <method-wrapper '__getitem__' of tuple object at 0x00130030> Since list is using the wrapper in tp_methods and tuple isn't, it *look* like METH_COEXIST *is* helping but since we know that in both the dict case and now the tuple/list case it is unnecessary, there's clearly a bug somewhere causing multiple arg tuple allocs and or arg parsing. Unfortunately, it takes me several hours to "catch up" from CALL_FUNCTION in ceval.c to something like call_method in typeobject.c and I don't have that kind of time at the moment but hopefully someone notices this again. |
|||
msg339449 - (view) | Author: Jeroen Demeyer (jdemeyer) * | Date: 2019-04-04 15:45 | |
> Amusingly, this is because of an old hack to make directly calling somedict.__getitem__ fast: https://github.com/python/cpython/commit/8f5cdaa784f555149adf5e94fd2e989f99d6b1db But what's the use case of making somedict.__getitem__(x) fast? One should just write somedict[x] instead. If PEP 580 or PEP 590 is accepted, it should be easy to simply make wrapper descriptors such as somedict.__getitem__ faster, removing the need for METH_COEXIST. |
|||
msg339453 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * | Date: 2019-04-04 16:13 | |
> But what's the use case of making somedict.__getitem__(x) fast? One should just write somedict[x] instead. Using them in itertools functions. somedict.__getitem__ is faster that lambda: somedict[x]. As for map() and filter(), comprehensions are almost as fast as map() and filter() with somedict.__getitem__ and are more readable. |
|||
msg339477 - (view) | Author: Jeroen Demeyer (jdemeyer) * | Date: 2019-04-05 08:52 | |
OK, makes sense. Also super() calls I guess: you can write super().__getitem__(x) but not super()[x] (although the latter *could* be implemented if we wanted to). I see two ways of fixing this: 1. Make wrapper descriptors faster, removing the need for the METH_COEXIST hack in the first place. 2. Deal better with METH_COEXIST methods when inheriting. Since both of these solutions are closely related to the topic of PEP 580/PEP 590, I suggest to try to fix this after either is implemented. |
|||
msg348185 - (view) | Author: Raymond Hettinger (rhettinger) * | Date: 2019-07-19 17:00 | |
Some thoughs: * Lib/sets.py benefitted significantly from itertools and METH_COEXIST. * The dunder methods are part of the public API. Serhiy's persona tastes aside, there is no reason not to use them like any other method. * Given a performance trade-off between classes and subclasses, almost universally we give preference to the parent class. * The folks who use itertools with __getitem__ and __contains__ almost always do so because they care about speed and have a preference for a functional style. Removing the current optimization would directly hit those folks and their style of coding, resulting in code that used to be fast and elegant becoming slow. |
|||
msg362715 - (view) | Author: Marco Sulla (Marco Sulla) * | Date: 2020-02-26 19:11 | |
I asked why on StackOverflow, and an user seemed to find the reason. The problem for him/her is in `update_one_slot()`. `dict` implements directly `__contains__()` and `__getitem__()`. Usually, `sq_contains` and `mp_subscript` are wrapped to implement `__contains__()` and `__getitem__()`, but this way `dict` is a little faster. The problem is that `update_one_slot()` searches for the wrappers. If it does not find them, it does not inherit the `__contains__()` and `__getitem__()` of the class, but create a `__contains__()` and `__getitem__()` functions that do an MRO search and call the superclass method. This is why `__contains__()` and `__getitem__()` of `dict` subclasses are slower. Is it possible to modify `update_one_slot()` so that, if no wrapper is found, the explicit implementation is inherited? SO answer: https://stackoverflow.com/a/59914459/1763602 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:04 | admin | set | github: 78577 |
2020-02-28 08:02:38 | methane | set | nosy:
+ methane |
2020-02-26 19:11:37 | Marco Sulla | set | versions:
+ Python 3.9, - Python 3.8 nosy: + Marco Sulla messages: + msg362715 components: + C API, - Interpreter Core |
2019-07-19 17:00:35 | rhettinger | set | messages: + msg348185 |
2019-07-19 15:54:49 | jdemeyer | set | keywords:
+ patch stage: patch review pull_requests: + pull_request14652 |
2019-04-05 08:52:58 | jdemeyer | set | messages: + msg339477 |
2019-04-04 16:13:40 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages: + msg339453 |
2019-04-04 15:54:05 | scoder | set | nosy:
+ scoder |
2019-04-04 15:45:14 | jdemeyer | set | nosy:
+ jdemeyer messages: + msg339449 |
2018-10-21 04:32:52 | rhettinger | set | nosy:
+ rhettinger versions: - Python 3.7 |
2018-10-21 04:09:47 | bup | set | messages: + msg328187 |
2018-08-14 04:40:20 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg323498 |
2018-08-13 19:36:41 | bup | create |