This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Dennis Sweeney
Recipients Dennis Sweeney, danielfleischman, methane, rhettinger, serhiy.storchaka
Date 2021-07-03.10:35:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1625308509.78.0.315995142586.issue44555@roundup.psfhosted.org>
In-reply-to
Content
> Can https://bugs.python.org/issue32623 be a fix (or mitigation) of this issue?

If such a change is implemented and dictionaries shrink when their fill falls below 1/8, the behavior of `while d: del d[next(iter(d))]` will remain quadradic: there will be `ma_used - dk_size/8` (linear in len(d)) iterations of that loop that scan over half that many dummies on average.

However, such a change could nonetheless speed up `while d: del d[next(iter(d))]` by a constant factor. As of now, the loop scans over roughly n**2/2 dummies. After that change, assuming the dict starts 1/3 full so the size is 3*n, there would be:

    (n - 3*n/8)**2 / 2 dummy scans before shrinking => 25/128 * n**2 scans
    which would leave 3*n/8 entries remaining, on which the same process would repeat.
    The total scans would be bounded by
    (25/128 * n**2) + (25/128 * (3*n/8)**2) + (25/128 * (9*n/64)**2)) + ...
    = 25/128 * n**2 * (1 + 9/64 + 81/4096 + ...)
    = 25/128 * n**2 / (1 - 9/64)
    = 25/128 * n**2 * (64 / 55)
    = 5/22 * n**2

... just under half the number of dummies to scan.

If we were to shrink slightly more aggressively, at a fill of 1/6, it would be something like:

    (n - 3*n/6)**2 / 2 dummy scans before shrinking => n**2 / 8 scans, leaving n/2 remaining
    The total would then be bounded by
    (n**2 / 8) + ((n/2)**2 / 8) + ((n/4)**2 / 8) + ...
    = n**2 / 8 * (1 + 1/4 + 1/16 + ...)
    = n**2 / 8 / (1 - 1/4)
    = n**2 / 8 * (4 / 3)
    = n**2 / 6

... about one third of the number of dummies to scan.

This ignores the linear overhead of actually doing the re-sizing, which would dominate on small tables, so maybe there could be some threshold below which there would be no shrinking, say 256 (dk_log2_size=8) or so.
History
Date User Action Args
2021-07-03 10:35:09Dennis Sweeneysetrecipients: + Dennis Sweeney, rhettinger, methane, serhiy.storchaka, danielfleischman
2021-07-03 10:35:09Dennis Sweeneysetmessageid: <1625308509.78.0.315995142586.issue44555@roundup.psfhosted.org>
2021-07-03 10:35:09Dennis Sweeneylinkissue44555 messages
2021-07-03 10:35:09Dennis Sweeneycreate