Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List overallocation strategy #82554

Closed
serhiy-storchaka opened this issue Oct 4, 2019 · 7 comments
Closed

List overallocation strategy #82554

serhiy-storchaka opened this issue Oct 4, 2019 · 7 comments
Labels
3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

BPO 38373
Nosy @tim-one, @rhettinger, @methane, @serhiy-storchaka, @brandtbucher
PRs
  • bpo-38373: Change list overallocating strategy. #18952
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-17.21:50:33.814>
    created_at = <Date 2019-10-04.22:43:43.056>
    labels = ['interpreter-core', '3.9', 'performance']
    title = 'List overallocation strategy'
    updated_at = <Date 2020-03-17.21:50:33.813>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2020-03-17.21:50:33.813>
    actor = 'serhiy.storchaka'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-17.21:50:33.814>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2019-10-04.22:43:43.056>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 38373
    keywords = ['patch']
    message_count = 7.0
    messages = ['353976', '353978', '353979', '353986', '353991', '353993', '364483']
    nosy_count = 5.0
    nosy_names = ['tim.peters', 'rhettinger', 'methane', 'serhiy.storchaka', 'brandtbucher']
    pr_nums = ['18952']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue38373'
    versions = ['Python 3.9']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently list uses the following formula for allocating an array for items (if size != 0):

        allocated = size + (size >> 3) + (3 if size < 9 else 6)

    If add items by 1 the growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...

    I think this strategy can be improved.

    1. There is no much sense in allocating the space for 25 items. The standard Python allocator returns addresses aligned to 16 bytes on 64-bit platform. It will allocate a space for 26 items. It is more efficient if the starting address is a multiple of some power of two, and perhaps larger than the pointer size.

    So it may be better to allocate a multiple of two or four items. Few first sizes are multiple of four, so this is good multiplier. I suggest to use the following expression:

        allocated = (size + (size >> 3) + 6) & ~3

    It adds bits AND, but removes conditional value.

    1. It is common enough case if the list is created from a sequence of a known size and do not adds items anymore. Or if it is created by concatenating of few sequences. In such cases the list can overallocate a space which will never be used. For example:
    >>> import sys
    >>> sys.getsizeof([1, 2, 3])
    80
    >>> sys.getsizeof([*(1, 2, 3)])
    104

    List display allocates a space for exactly 3 items. But a list created from a tuple allocates a space for 6 items.

    Other example. Let extend a list by 10 items at a time.

    size allocated
    10 17
    20 28
    30 39
    40 51
    50 51
    60 73
    70 73
    80 96
    90 96
    100 118

    We need to reallocate an array after first four steps. 17 is too small for 20, 28 is too small for 30, 39 is too small for 40. So overallocating does not help, it just spends a space.

    My idea, that if we adds several items at a time and need to reallocate an array, we check if the overallocated size is enough for adding the same amount of items next time. If it is not enough, we do not overallocate.

    The final algorithm is:

        allocated = (size + (size >> 3) + 6) & ~3
        if size - old_size > allocated - size:
            allocated = (size + 3) & ~3

    It will save a space if extend a list by many items few times.

    @serhiy-storchaka serhiy-storchaka added 3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 4, 2019
    @rhettinger
    Copy link
    Contributor

    We should definitely revisit the over-allocation strategy. I last worked on the existing strategy back in 2004. Since then, the weighting of the speed/space trade-off considerations have changed.

    We need to keep the amortized O(1) append() behavior, but possibly we would benefit from more over-allocation and fewer resizes. Also, we should look at the interaction with the small object allocator to see if its design is causing realloc() to frequently have to move data rather than extending in place.

    @serhiy-storchaka
    Copy link
    Member Author

    Here is some data. step is the number of items added at a time, the first row is the size, the second row is the currently allocated size, the third row is the proposed allocated size.

    for step in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10):
        sizes0 = range(step, step*31, step)
        sizes1 = []
        sizes2 = []
        old_size = allocated1 = allocated2 = 0
        for size in sizes0:
            if size > allocated1:
                allocated1 = size + (size >> 3) + (3 if size < 9 else 6)
            sizes1.append(allocated1)
            if size > allocated2:
                allocated2 = (size + (size >> 3) + 6) & ~3
                if size - old_size > allocated2 - size:
                    allocated2 = (size + 3) & ~3
            sizes2.append(allocated2)
            old_size = size
        print('\nstep =', step)
        print(' '.join(f'{x:3}' for x in sizes0))
        print(' '.join(f'{x:3}' for x in sizes1))
        print(' '.join(f'{x:3}' for x in sizes2))
    
    step = 1
      1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30
      4   4   4   4   8   8   8   8  16  16  16  16  16  16  16  16  25  25  25  25  25  25  25  25  25  35  35  35  35  35
      4   4   4   4   8   8   8   8  16  16  16  16  16  16  16  16  24  24  24  24  24  24  24  24  32  32  32  32  32  32
    
    step = 2
      2   4   6   8  10  12  14  16  18  20  22  24  26  28  30  32  34  36  38  40  42  44  46  48  50  52  54  56  58  60
      5   5   9   9  17  17  17  17  26  26  26  26  26  37  37  37  37  37  48  48  48  48  48  48  62  62  62  62  62  62
      8   8   8   8  16  16  16  16  24  24  24  24  32  32  32  32  44  44  44  44  44  44  56  56  56  56  56  56  68  68
    
    step = 3
      3   6   9  12  15  18  21  24  27  30  33  36  39  42  45  48  51  54  57  60  63  66  69  72  75  78  81  84  87  90
      6   6  16  16  16  26  26  26  36  36  36  36  49  49  49  49  63  63  63  63  63  80  80  80  80  80  97  97  97  97
      8   8  16  16  16  24  24  24  36  36  36  36  48  48  48  48  60  60  60  60  76  76  76  76  76  92  92  92  92  92
    
    step = 4
      4   8  12  16  20  24  28  32  36  40  44  48  52  56  60  64  68  72  76  80  84  88  92  96 100 104 108 112 116 120
      7  12  12  24  24  24  37  37  37  51  51  51  64  64  64  64  82  82  82  82 100 100 100 100 100 123 123 123 123 123
      8   8  16  16  28  28  28  40  40  40  52  52  52  68  68  68  68  84  84  84  84 104 104 104 104 104 124 124 124 124
    
    step = 5
      5  10  15  20  25  30  35  40  45  50  55  60  65  70  75  80  85  90  95 100 105 110 115 120 125 130 135 140 145 150
      8  17  17  28  28  39  39  51  51  51  67  67  67  84  84  84 101 101 101 101 124 124 124 124 146 146 146 146 146 174
      8  16  16  28  28  36  36  48  48  60  60  60  76  76  76  96  96  96  96 116 116 116 116 140 140 140 140 140 168 168
    
    step = 6
      6  12  18  24  30  36  42  48  54  60  66  72  78  84  90  96 102 108 114 120 126 132 138 144 150 156 162 168 174 180
      9  19  19  33  33  46  46  60  60  60  80  80  80 100 100 100 120 120 120 120 147 147 147 147 174 174 174 174 174 208
     12  12  24  24  36  36  52  52  64  64  80  80  80 100 100 100 120 120 120 120 144 144 144 144 172 172 172 172 200 200
    
    step = 7
      7  14  21  28  35  42  49  56  63  70  77  84  91  98 105 112 119 126 133 140 147 154 161 168 175 182 189 196 203 210
     10  21  21  37  37  53  53  69  69  84  84  84 108 108 108 132 132 132 155 155 155 155 187 187 187 187 218 218 218 218
      8  16  28  28  44  44  60  60  76  76  92  92  92 116 116 116 136 136 136 160 160 160 184 184 184 184 216 216 216 216
    
    step = 8
      8  16  24  32  40  48  56  64  72  80  88  96 104 112 120 128 136 144 152 160 168 176 184 192 200 208 216 224 232 240
     12  24  24  42  42  60  60  78  78  96  96  96 123 123 123 150 150 150 177 177 177 177 213 213 213 213 249 249 249 249
      8  24  24  40  40  60  60  76  76  96  96  96 120 120 120 148 148 148 176 176 176 176 212 212 212 212 248 248 248 248
    
    step = 9
      9  18  27  36  45  54  63  72  81  90  99 108 117 126 135 144 153 162 171 180 189 198 207 216 225 234 243 252 261 270
     16  26  36  36  56  56  76  76  97  97 117 117 117 147 147 147 178 178 178 208 208 208 208 249 249 249 249 289 289 289
     12  20  36  36  56  56  76  76  96  96 116 116 136 136 136 168 168 168 196 196 196 228 228 228 228 268 268 268 268 308
    
    step = 10
     10  20  30  40  50  60  70  80  90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300
     17  28  39  51  51  73  73  96  96 118 118 141 141 141 174 174 174 208 208 208 242 242 242 242 287 287 287 287 332 332
     12  20  32  40  60  60  84  84 104 104 128 128 152 152 152 184 184 184 216 216 216 252 252 252 252 296 296 296 296 340

    @tim-one
    Copy link
    Member

    tim-one commented Oct 5, 2019

    WRT pymalloc, it will always copy on growing resize in this context. A pymalloc pool is dedicated to blocks of the same size class, so if the size class increases (they're 16 bytes apart now), the data must be copied to a different pool (dedicated to blocks of the larger size class).

    @rhettinger
    Copy link
    Contributor

    WRT pymalloc, it will always copy on growing resize in this context

    That sounds like an excellent reason NOT to use pymalloc ;-)

    @tim-one
    Copy link
    Member

    tim-one commented Oct 5, 2019

    Don't know. Define "the problem" ;-) As soon as the allocation is over 512 bytes (64 pointers), it's punted to the system malloc family. Before then, do a relative handful of relatively small memcpy's really matter?

    pymalloc is faster than system mallocs, and probably uses space more efficiently. There are no pure wins here :-)

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 2fe815e by Serhiy Storchaka in branch 'master':
    bpo-38373: Change list overallocating strategy. (GH-18952)
    2fe815e

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants