classification
Title: List overallocation strategy
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brandtbucher, methane, rhettinger, serhiy.storchaka, tim.peters
Priority: normal Keywords: patch

Created on 2019-10-04 22:43 by serhiy.storchaka, last changed 2020-03-17 21:50 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18952 merged serhiy.storchaka, 2020-03-12 05:56
Messages (7)
msg353976 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 22:43
Currently list uses the following formula for allocating an array for items (if size != 0):

    allocated = size + (size >> 3) + (3 if size < 9 else 6)

If add items by 1 the growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

I think this strategy can be improved.

1. There is no much sense in allocating the space for 25 items. The standard Python allocator returns addresses aligned to 16 bytes on 64-bit platform. It will allocate a space for 26 items. It is more efficient if the starting address is a multiple of some power of two, and perhaps larger than the pointer size.

So it may be better to allocate a multiple of two or four items. Few first sizes are multiple of four, so this is good multiplier. I suggest to use the following expression:

    allocated = (size + (size >> 3) + 6) & ~3

It adds bits AND, but removes conditional value.

2. It is common enough case if the list is created from a sequence of a known size and do not adds items anymore. Or if it is created by concatenating of few sequences. In such cases the list can overallocate a space which will never be used. For example:

>>> import sys
>>> sys.getsizeof([1, 2, 3])
80
>>> sys.getsizeof([*(1, 2, 3)])
104

List display allocates a space for exactly 3 items. But a list created from a tuple allocates a space for 6 items.

Other example. Let extend a list by 10 items at a time.

size allocated
 10     17
 20     28
 30     39
 40     51
 50     51
 60     73
 70     73
 80     96
 90     96
100    118

We need to reallocate an array after first four steps. 17 is too small for 20, 28 is too small for 30, 39 is too small for 40. So overallocating does not help, it just spends a space.

My idea, that if we adds several items at a time and need to reallocate an array, we check if the overallocated size is enough for adding the same amount of items next time. If it is not enough, we do not overallocate.

The final algorithm is:

    allocated = (size + (size >> 3) + 6) & ~3
    if size - old_size > allocated - size:
        allocated = (size + 3) & ~3

It will save a space if extend a list by many items few times.
msg353978 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-10-04 23:08
We should definitely revisit the over-allocation strategy.  I last worked on the existing strategy back in 2004.  Since then, the weighting of the speed/space trade-off considerations have changed.   

We need to keep the amortized O(1) append() behavior, but possibly we would benefit from more over-allocation and fewer resizes.  Also, we should look at the interaction with the small object allocator to see if its design is causing realloc() to frequently have to move data rather than extending in place.
msg353979 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-04 23:17
Here is some data. step is the number of items added at a time, the first row is the size, the second row is the currently allocated size, the third row is the proposed allocated size.

for step in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10):
    sizes0 = range(step, step*31, step)
    sizes1 = []
    sizes2 = []
    old_size = allocated1 = allocated2 = 0
    for size in sizes0:
        if size > allocated1:
            allocated1 = size + (size >> 3) + (3 if size < 9 else 6)
        sizes1.append(allocated1)
        if size > allocated2:
            allocated2 = (size + (size >> 3) + 6) & ~3
            if size - old_size > allocated2 - size:
                allocated2 = (size + 3) & ~3
        sizes2.append(allocated2)
        old_size = size
    print('\nstep =', step)
    print(' '.join(f'{x:3}' for x in sizes0))
    print(' '.join(f'{x:3}' for x in sizes1))
    print(' '.join(f'{x:3}' for x in sizes2))

step = 1
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
  4   4   4   4   8   8   8   8  16  16  16  16  16  16  16  16  25  25  25  25  25  25  25  25  25  35  35  35  35  35
  4   4   4   4   8   8   8   8  16  16  16  16  16  16  16  16  24  24  24  24  24  24  24  24  32  32  32  32  32  32

step = 2
  2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60
  5   5   9   9  17  17  17  17  26  26  26  26  26  37  37  37  37  37  48  48  48  48  48  48  62  62  62  62  62  62
  8   8   8   8  16  16  16  16  24  24  24  24  32  32  32  32  44  44  44  44  44  44  56  56  56  56  56  56  68  68

step = 3
  3   6   9  12  15  18  21  24  27  30  33  36  39  42  45  48  51  54  57  60  63  66  69  72  75  78  81  84  87  90
  6   6  16  16  16  26  26  26  36  36  36  36  49  49  49  49  63  63  63  63  63  80  80  80  80  80  97  97  97  97
  8   8  16  16  16  24  24  24  36  36  36  36  48  48  48  48  60  60  60  60  76  76  76  76  76  92  92  92  92  92

step = 4
  4   8  12  16  20  24  28  32  36  40  44  48  52  56  60  64  68  72  76  80  84  88  92  96 100 104 108 112 116 120
  7  12  12  24  24  24  37  37  37  51  51  51  64  64  64  64  82  82  82  82 100 100 100 100 100 123 123 123 123 123
  8   8  16  16  28  28  28  40  40  40  52  52  52  68  68  68  68  84  84  84  84 104 104 104 104 104 124 124 124 124

step = 5
  5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95 100 105 110 115 120 125 130 135 140 145 150
  8  17  17  28  28  39  39  51  51  51  67  67  67  84  84  84 101 101 101 101 124 124 124 124 146 146 146 146 146 174
  8  16  16  28  28  36  36  48  48  60  60  60  76  76  76  96  96  96  96 116 116 116 116 140 140 140 140 140 168 168

step = 6
  6  12  18  24  30  36  42  48  54  60  66  72  78  84  90  96 102 108 114 120 126 132 138 144 150 156 162 168 174 180
  9  19  19  33  33  46  46  60  60  60  80  80  80 100 100 100 120 120 120 120 147 147 147 147 174 174 174 174 174 208
 12  12  24  24  36  36  52  52  64  64  80  80  80 100 100 100 120 120 120 120 144 144 144 144 172 172 172 172 200 200

step = 7
  7  14  21  28  35  42  49  56  63  70  77  84  91  98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210
 10  21  21  37  37  53  53  69  69  84  84  84 108 108 108 132 132 132 155 155 155 155 187 187 187 187 218 218 218 218
  8  16  28  28  44  44  60  60  76  76  92  92  92 116 116 116 136 136 136 160 160 160 184 184 184 184 216 216 216 216

step = 8
  8  16  24  32  40  48  56  64  72  80  88  96 104 112 120 128 136 144 152 160 168 176 184 192 200 208 216 224 232 240
 12  24  24  42  42  60  60  78  78  96  96  96 123 123 123 150 150 150 177 177 177 177 213 213 213 213 249 249 249 249
  8  24  24  40  40  60  60  76  76  96  96  96 120 120 120 148 148 148 176 176 176 176 212 212 212 212 248 248 248 248

step = 9
  9  18  27  36  45  54  63  72  81  90  99 108 117 126 135 144 153 162 171 180 189 198 207 216 225 234 243 252 261 270
 16  26  36  36  56  56  76  76  97  97 117 117 117 147 147 147 178 178 178 208 208 208 208 249 249 249 249 289 289 289
 12  20  36  36  56  56  76  76  96  96 116 116 136 136 136 168 168 168 196 196 196 228 228 228 228 268 268 268 268 308

step = 10
 10  20  30  40  50  60  70  80  90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300
 17  28  39  51  51  73  73  96  96 118 118 141 141 141 174 174 174 208 208 208 242 242 242 242 287 287 287 287 332 332
 12  20  32  40  60  60  84  84 104 104 128 128 152 152 152 184 184 184 216 216 216 252 252 252 252 296 296 296 296 340
msg353986 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-10-05 03:24
WRT pymalloc, it will always copy on growing resize in this context.  A pymalloc pool is dedicated to blocks of the same size class, so if the size class increases (they're 16 bytes apart now), the data must be copied to a different pool (dedicated to blocks of the larger size class).
msg353991 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-10-05 05:14
> WRT pymalloc, it will always copy on growing resize in this context

That sounds like an excellent reason NOT to use pymalloc ;-)
msg353993 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2019-10-05 05:32
Don't know.  Define "the problem" ;-)  As soon as the allocation is over 512 bytes (64 pointers), it's punted to the system malloc family.  Before then, do a relative handful of relatively small memcpy's really matter?

pymalloc is faster than system mallocs, and probably uses space more efficiently.  There are no pure wins here :-)
msg364483 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-03-17 21:46
New changeset 2fe815edd6778fb9deef8f8044848647659c2eb8 by Serhiy Storchaka in branch 'master':
bpo-38373: Change list overallocating strategy. (GH-18952)
https://github.com/python/cpython/commit/2fe815edd6778fb9deef8f8044848647659c2eb8
History
Date User Action Args
2020-03-17 21:50:33serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2020-03-17 21:46:03serhiy.storchakasetmessages: + msg364483
2020-03-12 05:56:57serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request18303
2019-10-16 07:38:27methanesetnosy: + methane
2019-10-05 05:32:19tim.peterssetmessages: + msg353993
2019-10-05 05:14:21rhettingersetmessages: + msg353991
2019-10-05 03:24:43tim.peterssetmessages: + msg353986
2019-10-04 23:17:52serhiy.storchakasetmessages: + msg353979
2019-10-04 23:08:21rhettingersetmessages: + msg353978
2019-10-04 22:55:09brandtbuchersetnosy: + brandtbucher
2019-10-04 22:43:43serhiy.storchakacreate