Message396880
The following code takes quadratic time on the size of the dictionary passed, regardless of the dictionary (explanation below):
```
def slow_dictionary(d):
while len(d) > 0:
# Remove first element
key = next(iter(d))
del d[key]
```
The problem is that when an element is deleted a NULL/NULL placeholder is set in its place (https://github.com/python/cpython/blob/818628c2da99ba0376313971816d472c65c9a9fc/Objects/dictobject.c#L1534) and when we try to find the first element with `next(iter(d))` the code needs to skip over all the NULL elements until it finds the first non-NULL element (https://github.com/python/cpython/blob/818628c2da99ba0376313971816d472c65c9a9fc/Objects/dictobject.c#L1713).
I'm not sure of what is the best way to fix it, but note that simply adding a field to the struct with the position of the first non-NULL element is not enough, since a code that always deletes the SECOND element of the dictionary would still have linear operations.
An easy (but memory-wasteful) fix would be to augment the struct PyDictKeyEntry with the indices of the next/previous non empty elements, and augment _dictkeysobject with the index of the first and last non empty elements (in other words, maintain an underlying linked list of the non empty entries). With this we can always iterate in O(1) per entry.
(I tested it only on version 3.9.2, but I would be surprised if it doesn't happen in other versions as well). |
|
Date |
User |
Action |
Args |
2021-07-02 21:20:05 | danielfleischman | set | recipients:
+ danielfleischman |
2021-07-02 21:20:05 | danielfleischman | set | messageid: <1625260805.66.0.276677787227.issue44555@roundup.psfhosted.org> |
2021-07-02 21:20:05 | danielfleischman | link | issue44555 messages |
2021-07-02 21:20:05 | danielfleischman | create | |
|