Author Chad.Netzer
Recipients Chad.Netzer, brandtbucher, corona10, serhiy.storchaka, vstinner
Date 2021-04-04.04:44:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1617511481.74.0.465113047244.issue43574@roundup.psfhosted.org>
In-reply-to
Content
For bpo-43574, my initial plan was to change list_resize() so that it wouldn't
overallocate empty lists that were resized to be bigger, thus restoring the
overallocation behavior for list-literals to be like that of earlier Python
releases.

However, the list_resize() helper is also used by list appends and inserts, as
well as other list methods that change list sizes.  This proposed strategy of
not-overallocating any resize that starts with an empty list also affects
their behavior, and with the list-overallocation calculation that was
introduced with bpo-38373, this causes unexpected side-effects on the amount
of overallocation used for appending/inserting with an empty list.

Therefore, my updated proposal handles this by changing the overallocation
behavior in list_resize() only when empty lists are increased by an amount
greater than 1 during list_resize().  This basically retains the behavior of
not overallocating list-literals, while not changing the behavior for list
append/insert on an empty list.

To understand why this updated proposed change won't also cause overallocation
for length-1 literals, see below how length-1 and length-2 list-literals are
compiled using BUILD_LIST instead of LIST_EXTEND:

```
Python 3.9.2 (default, Mar 26 2021, 23:27:12)
>>> import dis
>>> def foo():
...   [1]
...
>>> dis.dis(foo)
  2           0 LOAD_CONST               1 (1)
              2 BUILD_LIST               1
              4 POP_TOP
              6 LOAD_CONST               0 (None)
              8 RETURN_VALUE
>>> def bar():
...   [1,2]
...
>>> dis.dis(bar)
  2           0 LOAD_CONST               1 (1)
              2 LOAD_CONST               2 (2)
              4 BUILD_LIST               2
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> def baz():
...   [1,2,3]
...
>>> dis.dis(baz)
  2           0 BUILD_LIST               0
              2 LOAD_CONST               1 ((1, 2, 3))
              4 LIST_EXTEND              1
              6 POP_TOP
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
```


Hence, the change to list_resize(), which is called by list_extend() but not
the BUILD_LIST opcode, won't impact list-literals of length 1 and 2.


And to show how the originally proposed change (no overallocation for
list_resize() of empty lists) inadvertently changed how list append/insert
overallocation worked, let's first take a look at how current Python (3.9+)
works:

```
>>> l = []
>>> l.__sizeof__()
40
>>> l.append("a")  # First append will overallocate to capacity 4
>>> l.__sizeof__()
72
>>> l.append("b")
>>> l.__sizeof__()
72
>>> l.append("c")
>>> l.__sizeof__()
72
>>> l.append("d")
>>> l.__sizeof__()
72
>>> l.append("e")
>>> l.__sizeof__()
104
>>> l2 = ["a"]  # Length 1 (and 2) literals don't overallocate
>>> l2.__sizeof__()
48
>>> # However, note that the first append will overallocate to capacity 8
>>> l2.append("b")
>>> l2.__sizeof__()
104
```

Note that the list-literal of length 1 isn't overallocated, and that
appending to it skips the capacity 4 list and goes straight to capacity 8.

However, with my originally proposed change, this overallocation behavior is
duplicated even for lists that start empty, because the first append then
effectively becomes a non-overallocated list of length and capacity 1, which
means that the second append overallocates to capacity 8.

Here was the overallocation behavior of my first proposal, when an empty list
was appended:
```
Python 3.10.0a6+ (heads/bpo43574-dont-overallocate-list-literals:7436223a71, Apr  3 2021, 16:32:22) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
>>> l = []
>>> l.__sizeof__()
40
>>> l.append("a")
>>> l.__sizeof__()  # No overallocation on first append
48
>>> l.append("b")
>>> l.__sizeof__()  # Second append jumps to capacity 8, skipping 4
104
```

This is unexpected and likely undesirable behavior, as lists being created
empty and then appended to should be expected to follow the same documented 4,
8, 16, 24, ...  growth pattern for bpo-38373.

Fixing this is fairly simple because of the special-casing for length 1 and 2
list-literals, just by checking for the use of append/insert on an empty list.
Because they both change the size of a length 0 list to length 1, and the
list_resize() for literals always kicks in for changing to lengths 3 or more,
this updates proposal will retain the current empty-list insert/append
overallocation behavior, while still allowing list-literals of length 1 or
more to not overallocate.

With this updated proposal, avoiding a change to the behavior of an empty list
append/insert, the overallocation for list-literals is still removed, and the
current overallocation behavior for empty lists being appended is preserved:
```
Python 3.10.0a6+ (heads/bpo43574-dont-overallocate-list-literals-v2:56b361221d, Apr  3 2021, 17:35) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
>>> l = []
>>> l.__sizeof__()
40
>>> l.append("a")
>>> l.__sizeof__()
72
>>> l.append("b")
>>> l.__sizeof__()
72
>>> l2 = ["a"]
>>> l2.__sizeof__()
48
>>> l2 = ["a", "b", "c"] # list_extend() literal setup case starts w/ 3 elements
>>> l2.__sizeof__()
64
```

So, with this variant on my original proposal, using the current compilation
behavior of setting up list-literals of length 3 or greater using
list_extend(), we regain the previous Python behavior of not overallocating
list-literals, and we also retain the current overallocation behavior for
empty lists that are appended to.

Just to document it, here are the functions in Objects/listobject.c that are
impacted by a change in list_resize(), and the proposed bpo-43574 change in
particular:

ins1()  # insert 1 element in list, including empty list.
app1()  # append 1 element to list.  Same impact as with ins1()
list_ass_slice()  # can assign a sequence to a list slice, expanding list
list_inplace_repeat()  # No impact, early exit for len == 0
list_extend()  # Now used for list-literals of len > 2
list_pop() # No impact, same behavior for list becoming len == 0
list_ass_subscript()  # No impact, list_resize() only called for deletion

The list_ass_slice() case is changed by this update, for an empty list, as
overallocation will not occur on the first list_extend() with more than 1
element (ie. just like list-literals, the overallocation will be delayed
until a future list_resize()).  Here's an example of the change, first for
current Python 3.9+ behavior:
```

Python 3.9.2 (default, Mar 26 2021, 23:27:12)
>>> l = []
>>> l.__sizeof__()
40
>>> l[:] = ["a"]
>>> l.__sizeof__()
72
>>> l = []
>>> l[:] = ["a", "b"]
>>> l.__sizeof__()
104
>>> l.append("c")
>>> l.__sizeof__()
104
```
and the behavior with my revised proposal for non-overallocation of
list-literals:
```
Python 3.10.0a6+ (heads/bpo43574-dont-overallocate-list-literals-v2:56b361221d, Apr  3 2021, 17:35) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
>>> l = []
>>> l.__sizeof__()
40
>>> l[:] = ["a"]
>>> l.__sizeof__()
72
>>> l = []
>>> l[:] = ["a", "b"]
>>> l.__sizeof__()
56
>>> l.append("c")
>>> l.__sizeof__()
104
```
History
Date User Action Args
2021-04-04 04:44:41Chad.Netzersetrecipients: + Chad.Netzer, vstinner, serhiy.storchaka, corona10, brandtbucher
2021-04-04 04:44:41Chad.Netzersetmessageid: <1617511481.74.0.465113047244.issue43574@roundup.psfhosted.org>
2021-04-04 04:44:41Chad.Netzerlinkissue43574 messages
2021-04-04 04:44:40Chad.Netzercreate