classification
Title: use malloc() for better performance of some list operations
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, scoder, sir-sigurd, xiang.zhang, xtreak
Priority: normal Keywords: patch

Created on 2018-07-18 17:31 by sir-sigurd, last changed 2018-08-11 13:14 by xiang.zhang. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 8332 merged sir-sigurd, 2018-07-18 17:33
Messages (4)
msg321905 - (view) Author: Sergey Fedoseev (sir-sigurd) * Date: 2018-07-18 17:31
Currently list concatenation, slicing and repeating operations are using PyList_New() which allocates memory for the items by calloc(). malloc() could be used instead, since the allocated memory is overwritten by mentioned operations.

I made benchmarks with this script:

NAME=list-malloc-master.json

python -m perf timeit --name slice0 -s "l = [None]*1000000" "l[:0]" --duplicate=2048 --append $NAME
python -m perf timeit --name slice1 -s "l = [None]*1000000" "l[:1]" --duplicate=1024 --append $NAME
python -m perf timeit --name slice2 -s "l = [None]*1000000" "l[:2]" --duplicate=1024 --append $NAME
python -m perf timeit --name slice3 -s "l = [None]*1000000" "l[:3]" --duplicate=1024 --append $NAME
python -m perf timeit --name slice1000000 -s "l = [None]*1000000" "l[:1000000]" --append $NAME

python -m perf timeit --name cat0 -s "l = [None]*0" "l + l" --duplicate=1024 --append $NAME
python -m perf timeit --name cat1 -s "l = [None]*1" "l * 1" --duplicate=1024 --append $NAME
python -m perf timeit --name cat2 -s "l = [None]*2" "l * 1" --duplicate=1024 --append $NAME
python -m perf timeit --name cat3 -s "l = [None]*3" "l * 1" --duplicate=1024 --append $NAME
python -m perf timeit --name cat1000000 -s "l = [None]*1000000" "l * 1" --append $NAME

python -m perf timeit --name 1x0 -s "l = [None]" "l * 0" --duplicate=1024 --append $NAME
python -m perf timeit --name 1x1 -s "l = [None]" "l * 1" --duplicate=1024 --append $NAME
python -m perf timeit --name 1x2 -s "l = [None]" "l * 2" --duplicate=1024 --append $NAME
python -m perf timeit --name 1x3 -s "l = [None]" "l * 3" --duplicate=1024 --append $NAME
python -m perf timeit --name 1x1000000 -s "l = [None]" "l * 1000000" --append $NAME 


Here's comparison table:

+--------------+--------------------+------------------------------+
| Benchmark    | list-malloc-master | list-malloc                  |
+==============+====================+==============================+
| slice1       | 84.5 ns            | 59.6 ns: 1.42x faster (-30%) |
+--------------+--------------------+------------------------------+
| slice2       | 71.6 ns            | 61.8 ns: 1.16x faster (-14%) |
+--------------+--------------------+------------------------------+
| slice3       | 74.4 ns            | 63.6 ns: 1.17x faster (-15%) |
+--------------+--------------------+------------------------------+
| slice1000000 | 4.39 ms            | 4.08 ms: 1.08x faster (-7%)  |
+--------------+--------------------+------------------------------+
| cat0         | 23.9 ns            | 24.9 ns: 1.04x slower (+4%)  |
+--------------+--------------------+------------------------------+
| cat1         | 73.2 ns            | 51.9 ns: 1.41x faster (-29%) |
+--------------+--------------------+------------------------------+
| cat2         | 61.6 ns            | 53.1 ns: 1.16x faster (-14%) |
+--------------+--------------------+------------------------------+
| cat3         | 63.0 ns            | 54.3 ns: 1.16x faster (-14%) |
+--------------+--------------------+------------------------------+
| cat1000000   | 4.38 ms            | 4.08 ms: 1.07x faster (-7%)  |
+--------------+--------------------+------------------------------+
| 1x0          | 27.1 ns            | 27.7 ns: 1.02x slower (+2%)  |
+--------------+--------------------+------------------------------+
| 1x1          | 72.9 ns            | 51.9 ns: 1.41x faster (-29%) |
+--------------+--------------------+------------------------------+
| 1x2          | 60.9 ns            | 52.9 ns: 1.15x faster (-13%) |
+--------------+--------------------+------------------------------+
| 1x3          | 62.5 ns            | 54.8 ns: 1.14x faster (-12%) |
+--------------+--------------------+------------------------------+
| 1x1000000    | 2.67 ms            | 2.34 ms: 1.14x faster (-12%) |
+--------------+--------------------+------------------------------+

Not significant (1): slice0
msg321914 - (view) Author: Stefan Behnel (scoder) * Date: 2018-07-18 20:07
Nice! Patch looks good to me, minus the usual naming nit-pick.
msg321925 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-07-19 05:27
+1 This looks like a reasonable improvement.
msg323415 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2018-08-11 13:12
New changeset 2fc46979b8c802675ca7fd51c6f2108a305001c8 by Xiang Zhang (Sergey Fedoseev) in branch 'master':
bpo-34151: Improve performance of some list operations (GH-8332)
https://github.com/python/cpython/commit/2fc46979b8c802675ca7fd51c6f2108a305001c8
History
Date User Action Args
2018-08-11 13:14:26xiang.zhangsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-08-11 13:12:11xiang.zhangsetnosy: + xiang.zhang
messages: + msg323415
2018-07-19 06:19:13xtreaksetnosy: + xtreak
2018-07-19 05:27:15rhettingersetnosy: + rhettinger
messages: + msg321925
2018-07-18 20:07:04scodersetnosy: + scoder

messages: + msg321914
versions: + Python 3.8
2018-07-18 17:33:45sir-sigurdsetkeywords: + patch
stage: patch review
pull_requests: + pull_request7869
2018-07-18 17:31:49sir-sigurdcreate