classification
Title: Improve list() pre-sizing for inputs with known lengths
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: miss-islington, pablogsal, rhettinger, serhiy.storchaka, tim.peters, vstinner
Priority: normal Keywords: patch

Created on 2018-04-05 21:46 by rhettinger, last changed 2019-02-16 20:47 by miss-islington. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6493 closed pablogsal, 2018-04-17 00:28
PR 9846 merged pablogsal, 2018-10-13 17:01
PR 10200 merged pablogsal, 2018-10-28 20:48
PR 11220 merged sir-sigurd, 2018-12-20 08:05
PR 11899 merged rhettinger, 2019-02-16 20:42
Messages (12)
msg315006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-05 21:46
The list() constructor isn't taking full advantage of known input lengths or length hints.  Ideally, it should pre-size and not over-allocate when the input size is known or can be reasonably estimated.

Python 3.8.0a0 (heads/master:091e95e900, Apr  5 2018, 09:48:33)
[Clang 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from sys import getsizeof
>>> getsizeof([0] * 10)
144
>>> getsizeof(list([0] * 10))
200
>>> getsizeof(list(range(10)))
200
msg315412 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-17 20:32
Calling PyObject_LengthHint() adds an overhead. It is significant for short sequences. I work on a patch that reduces it. PR 6493 adds the second call of PyObject_LengthHint() and increases the overhead.

As for this issue, in-place repeat overallocates too.

>>> a = [0]; a *= 10; getsizeof(a)
200

I think it would be better to make it not preallocating.

And maybe it would be worth to avoid overallocating if newsize > allocated + allocated/8 or something like.
msg315420 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-04-17 21:07
Microbenchmarked with @vstinner's perf module:

python -m perf timeit "list([0]*10)" -o new.json

checkout master and build

python -m perf timeit "list([0]*10)" -o old.json
python -m perf compare_to new.json old.json
Mean +- std dev: [new] 241 ns +- 3 ns -> [old] 243 ns +- 18 ns: 1.01x slower (+1%)
Not significant!
msg315940 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-04-30 08:39
See also http://bugs.python.org/issue26814#msg264003 and bpo-26828.
msg315941 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-04-30 08:46
Should we shrink the list of the length hint was way too big? For example, if the length hint was 100 but the final list of only 10. Should we shrink in that case?

I'm asking because it seems like Pablo's PR doesn't shrink the list in that case.

I wasn't sure so I checked, del shrinks the list if needed:

>>> x=list(range(1000))
>>> import sys
>>> sys.getsizeof(x)
9112
>>> del x[1:]
>>> sys.getsizeof(x)
96
msg315973 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-04-30 22:22
@serhiy.storchaka @rhettinger @vstinner Should we better make the pre-allocation if the length of the iterable is known (so we call PyObject_Length and not PyObject_LengthHint)? This will keep the logic simpler (so not shrinking if PyObject_LengthHint gives more than the real length) and it will not be as expensive as calling PyObject_LengthHint.
msg328738 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-10-28 20:16
New changeset 372d705d958964289d762953d0a61622755f5386 by Pablo Galindo in branch 'master':
bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)
https://github.com/python/cpython/commit/372d705d958964289d762953d0a61622755f5386
msg328741 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-28 20:32
> bpo-33234 Improve list() pre-sizing for inputs with known lengths (GH-9846)

Oh. Can you please document the optimization in the following doc as well?
https://docs.python.org/dev/whatsnew/3.8.html#optimizations
msg328748 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-10-28 20:49
>Oh. Can you please document the optimization in the following doc as well?

Sure! Opened https://github.com/python/cpython/pull/10200 to address that. :)
msg328762 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-10-28 22:03
New changeset c61e229d2a4c54ffb4153e1f0f48126ba33c9cbf by Pablo Galindo in branch 'master':
bpo-33234: Add exact allocation optimization to lists in What's New (GH-10200)
https://github.com/python/cpython/commit/c61e229d2a4c54ffb4153e1f0f48126ba33c9cbf
msg332737 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2018-12-29 22:31
New changeset 0e5f771f38138714415f665651de7e674fcebc38 by Pablo Galindo (Sergey Fedoseev) in branch 'master':
bpo-33234: Simplify list_preallocate_exact() (GH-11220)
https://github.com/python/cpython/commit/0e5f771f38138714415f665651de7e674fcebc38
msg335717 - (view) Author: miss-islington (miss-islington) Date: 2019-02-16 20:47
New changeset e182318e6a473e6e9341f1aab8e49f2b1035c674 by Miss Islington (bot) (Raymond Hettinger) in branch 'master':
bpo-33234: Add another attribution in Whatsnew (GH-11899)
https://github.com/python/cpython/commit/e182318e6a473e6e9341f1aab8e49f2b1035c674
History
Date User Action Args
2019-02-16 20:47:50miss-islingtonsetnosy: + miss-islington
messages: + msg335717
2019-02-16 20:42:26rhettingersetpull_requests: + pull_request11926
2018-12-29 22:33:35pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-12-29 22:31:39pablogsalsetmessages: + msg332737
2018-12-20 08:05:14sir-sigurdsetpull_requests: + pull_request10490
2018-10-28 22:03:22pablogsalsetmessages: + msg328762
2018-10-28 20:49:56pablogsalsetmessages: + msg328748
2018-10-28 20:48:53pablogsalsetpull_requests: + pull_request9518
2018-10-28 20:32:24vstinnersetmessages: + msg328741
2018-10-28 20:16:30pablogsalsetmessages: + msg328738
2018-10-15 13:48:17rhettingersetnosy: + tim.peters
2018-10-13 17:01:40pablogsalsetpull_requests: + pull_request9218
2018-04-30 22:22:04pablogsalsetmessages: + msg315973
2018-04-30 08:46:25vstinnersetmessages: + msg315941
2018-04-30 08:39:36vstinnersetnosy: + vstinner
messages: + msg315940
2018-04-17 21:07:31pablogsalsetnosy: + pablogsal
messages: + msg315420
2018-04-17 20:32:06serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg315412
2018-04-17 00:28:44pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request6190
2018-04-05 21:46:53rhettingercreate