Author inada.naoki
Recipients Mark.Shannon, inada.naoki, josh.r, louielu, r.david.murray, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, vstinner, xiang.zhang
Date 2019-03-15.06:13:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1552630432.13.0.512509208629.issue30040@roundup.psfhosted.org>
In-reply-to
Content
> I do not like how much code is needed for such minor optimization.

PR 12307 is +46 lines.  Benefit is about 10ns when first insertion.

$ cpython/python.opt -m perf timeit --compare-to master/python.opt --python-names master:empty-dict2 --duplicate 100 'd={};d["a"]=1'
master: ..................... 62.6 ns +- 1.1 ns
empty-dict2: ..................... 52.5 ns +- 1.0 ns

Mean +- std dev: [master] 62.6 ns +- 1.1 ns -> [empty-dict2] 52.5 ns +- 1.0 ns: 1.19x faster (-16%)


While "1.19x faster" is significant, I agree that this optimization is "minor".
_PyDict_NewPresized() is used for important cases.  So this optimization doesn't work in most case.


But I came up with an idea.  _PyDict_NewPresized(n) skips preallocation when n fit in minsize table.
https://github.com/python/cpython/pull/12307/commits/93906f1568b08c59e0afddfc98070827c65cd0f9

With this idea, "first insertion optimization" works.  Additionally, it may help branch prediction
for `while (newsize < minsize) { newsize <<= 1 }` loop.

Total diff is still +46 lines.  But it affects more cases.


$ alias compare='cpython/python.opt -m perf timeit --compare-to master/python.opt --python-names master:empty-dict2'

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4)'  # No table
Mean +- std dev: [master] 65.4 ns +- 2.3 ns -> [empty-dict2] 64.5 ns +- 1.5 ns: 1.01x faster (-1%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1)'  # minsize table is allocated
Mean +- std dev: [master] 152 ns +- 3 ns -> [empty-dict2] 144 ns +- 4 ns: 1.05x faster (-5%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2)'
Mean +- std dev: [master] 211 ns +- 3 ns -> [empty-dict2] 186 ns +- 3 ns: 1.13x faster (-12%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3)'
Mean +- std dev: [master] 248 ns +- 3 ns -> [empty-dict2] 223 ns +- 3 ns: 1.11x faster (-10%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3,d=4,e=5)'
Mean +- std dev: [master] 327 ns +- 6 ns -> [empty-dict2] 301 ns +- 6 ns: 1.09x faster (-8%)

$ compare -s 'def f(x, **kw): pass' --duplicate 10 -- 'f(4, a=1,b=2,c=3,d=4,e=5,f=6)'  # minsize*2 table is allocated
Mean +- std dev: [master] 431 ns +- 8 ns -> [empty-dict2] 406 ns +- 8 ns: 1.06x faster (-6%)


Now I think PR 12307 is not so minor.  Of course, same idea can be applied to PR 12308.
History
Date User Action Args
2019-03-15 06:13:52inada.naokisetrecipients: + inada.naoki, tim.peters, rhettinger, terry.reedy, vstinner, r.david.murray, Mark.Shannon, serhiy.storchaka, josh.r, xiang.zhang, louielu
2019-03-15 06:13:52inada.naokisetmessageid: <1552630432.13.0.512509208629.issue30040@roundup.psfhosted.org>
2019-03-15 06:13:52inada.naokilinkissue30040 messages
2019-03-15 06:13:51inada.naokicreate