classification
Title: micro-optimization of PyLong_FromSize_t()
Type: performance Stage: commit review
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Greg Price, gregory.p.smith, sir-sigurd, vstinner
Priority: normal Keywords: patch

Created on 2019-08-09 13:27 by sir-sigurd, last changed 2019-10-01 11:29 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 15192 merged sir-sigurd, 2019-08-09 13:30
PR 16517 merged vstinner, 2019-10-01 11:09
Messages (4)
msg349285 - (view) Author: Sergey Fedoseev (sir-sigurd) * Date: 2019-08-09 13:27
Currently PyLong_FromSize_t() uses PyLong_FromLong() for values < PyLong_BASE. It's suboptimal because PyLong_FromLong() needs to handle the sign. Removing PyLong_FromLong() call and handling small ints directly in PyLong_FromSize_t() makes it faster:

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 18.7 ns +- 0.3 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 16.7 ns +- 0.1 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 18.7 ns +- 0.3 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 16.7 ns +- 0.1 ns: 1.12x faster (-10%)

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**10).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 26.2 ns +- 0.0 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.0 ns +- 0.7 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 26.2 ns +- 0.0 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.0 ns +- 0.7 ns: 1.05x faster (-5%)

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**30).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 25.6 ns +- 0.1 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.6 ns +- 0.0 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 25.6 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.6 ns +- 0.0 ns: 1.00x faster (-0%)


This change makes PyLong_FromSize_t() consistently faster than PyLong_FromSsize_t(). So it might make sense to replace PyLong_FromSsize_t() with PyLong_FromSize_t() in __length_hint__() implementations and other similar cases. For example:

$ python -m perf timeit -s "_len = iter(bytes(2)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 19.4 ns +- 0.3 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 17.3 ns +- 0.1 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 19.4 ns +- 0.3 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 17.3 ns +- 0.1 ns: 1.12x faster (-11%)

$ python -m perf timeit -s "_len = iter(bytes(2**10)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 26.3 ns +- 0.1 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 25.3 ns +- 0.2 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 26.3 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 25.3 ns +- 0.2 ns: 1.04x faster (-4%)

$ python -m perf timeit -s "_len = iter(bytes(2**30)).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=10000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 27.6 ns +- 0.1 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 26.0 ns +- 0.1 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 27.6 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 26.0 ns +- 0.1 ns: 1.06x faster (-6%)
msg350446 - (view) Author: Sergey Fedoseev (sir-sigurd) * Date: 2019-08-25 09:58
Previous benchmarks results were obtained with non-LTO build.

Here are results for LTO build:

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 0).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 14.9 ns +- 0.2 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 13.1 ns +- 0.5 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 14.9 ns +- 0.2 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 13.1 ns +- 0.5 ns: 1.13x faster (-12%)

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**10).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 22.1 ns +- 0.1 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 20.9 ns +- 0.4 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 22.1 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 20.9 ns +- 0.4 ns: 1.05x faster (-5%)

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**30).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 23.3 ns +- 0.0 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 21.6 ns +- 0.1 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 23.3 ns +- 0.0 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 21.6 ns +- 0.1 ns: 1.08x faster (-8%)

$ python -m perf timeit -s "from itertools import repeat; _len = repeat(None, 2**60).__length_hint__" "_len()" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 24.4 ns +- 0.1 ns
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 22.7 ns +- 0.1 ns
Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 24.4 ns +- 0.1 ns -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 22.7 ns +- 0.1 ns: 1.08x faster (-7%)
msg352184 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-09-12 14:41
New changeset c6734ee7c55add5fdc2c821729ed5f67e237a096 by Gregory P. Smith (Sergey Fedoseev) in branch 'master':
bpo-37802: Slightly improve perfomance of PyLong_FromUnsigned*() (GH-15192)
https://github.com/python/cpython/commit/c6734ee7c55add5fdc2c821729ed5f67e237a096
msg353675 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-10-01 11:29
New changeset 6314abcc08f5d0f3d3a915dc9455ea223fa65517 by Victor Stinner in branch 'master':
bpo-37802: Fix a compiler warning in longobject.c (GH-16517)
https://github.com/python/cpython/commit/6314abcc08f5d0f3d3a915dc9455ea223fa65517
History
Date User Action Args
2019-10-01 11:29:56vstinnersetnosy: + vstinner
messages: + msg353675
2019-10-01 11:09:51vstinnersetpull_requests: + pull_request16107
2019-09-12 14:42:39gregory.p.smithsetstatus: open -> closed
resolution: fixed
stage: patch review -> commit review
2019-09-12 14:41:17gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg352184
2019-08-25 09:58:32sir-sigurdsetmessages: + msg350446
2019-08-10 21:34:14Greg Pricesetnosy: + Greg Price
2019-08-09 13:30:05sir-sigurdsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14924
2019-08-09 13:27:40sir-sigurdcreate