Author vstinner
Recipients methane, serhiy.storchaka, vstinner
Date 2018-10-24.11:16:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1540379795.09.0.788709270274.issue35053@psf.upfronthosting.co.za>
In-reply-to
Content
> Is performance overhead negligible?

Thank you for asking the most important question :-)

I ran this microbenchmark:

make distclean
./configure --enable-lto
make
./python -m venv env
env/bin/python -m pip install perf
sudo env/bin/python -m perf system tune
env/bin/python -m perf timeit -o FILE.json -v '[]'


My first attempt:

$ env/bin/python -m perf compare_to ref.json patch.json 
Mean +- std dev: [ref] 20.6 ns +- 0.1 ns -> [patch] 22.4 ns +- 0.1 ns: 1.09x slower (+9%)

The addition of the _PyTraceMalloc_NewReference() call which does nothing (check tracing flag, return) adds 1.7 ns: it's not negligible on such micro-benchmark, and I would prefer to avoid it whenever possible since _Py_NewReference() is the root of the free list optimization.


New attempt: expose tracemalloc_config and add _Py_unlikely() macro (instruct the compiler that tracing is false most of the time):

Mean +- std dev: [ref] 20.6 ns +- 0.1 ns -> [unlikely] 20.4 ns +- 0.3 ns: 1.01x faster (-1%)

Good! The overhead is now negligible!


But... is the hardcore low-level _Py_unlikely() optimization really needed?...

$ env/bin/python -m perf compare_to ref.json if_tracing.json 
Benchmark hidden because not significant (1): timeit

=> no, the macro is useless, so I removed it!



New benchmark to double-check on my laptop.

git checkout master
make clean; make
rm -rf env; ./python -m venv env; env/bin/python -m pip install perf
sudo env/bin/python -m perf system tune
env/bin/python -m perf timeit -o ref.json -v '[]' --rigorous

git checkout tracemalloc_newref
make clean; make
rm -rf env; ./python -m venv env; env/bin/python -m pip install perf
env/bin/python -m perf timeit -o patch.json -v '[]' --rigorous

$ env/bin/python -m perf compare_to ref.json patch.json 
Mean +- std dev: [ref] 20.8 ns +- 0.7 ns -> [patch] 20.5 ns +- 0.3 ns: 1.01x faster (-1%)


The std dev is a little bit high. I didn't use CPU isolation and Hexchat + Firefox was running in the background, *but* it seems like the mean is very close, and so that my PR has no significant overhead.
History
Date User Action Args
2018-10-24 11:16:35vstinnersetrecipients: + vstinner, methane, serhiy.storchaka
2018-10-24 11:16:35vstinnersetmessageid: <1540379795.09.0.788709270274.issue35053@psf.upfronthosting.co.za>
2018-10-24 11:16:35vstinnerlinkissue35053 messages
2018-10-24 11:16:34vstinnercreate