classification
Title: Optimize BUILD_CONST_KEY_MAP for distinct keys
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.10
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: methane Nosy List: Mark.Shannon, methane
Priority: normal Keywords: patch

Created on 2020-10-23 08:46 by methane, last changed 2020-11-02 12:04 by methane. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 22911 closed methane, 2020-10-23 09:05
Messages (6)
msg379415 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-23 08:46
BUILD_CONST_KEY_MAP can be optimized based on #41835 optimization.

1. compiler checks keys tuple is distinct.
2. Add distinct flag to BUILD_CONST_KEY_MAP oparg

To be considered:

* Should we use new opcode, instead of flag in oparg?
* Is this technique safe? Wrong co_consts can make invalid dict instance.
msg379430 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-23 11:46
$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 23.5 ns +- 0.2 ns
/home/inada-n/work/python/cpython/python: ..................... 22.4 ns +- 0.1 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 23.5 ns +- 0.2 ns -> [/home/inada-n/work/python/cpython/python] 22.4 ns +- 0.1 ns: 1.05x f
aster (-5%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 83.5 ns +- 0.3 ns
/home/inada-n/work/python/cpython/python: ..................... 96.7 ns +- 0.2 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 83.5 ns +- 0.3 ns -> [/home/inada-n/work/python/cpython/python] 96.7 ns +- 0.2 ns: 1.16x s
lower (+16%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2,3:3}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 108 ns +- 0 ns
/home/inada-n/work/python/cpython/python: ..................... 110 ns +- 1 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 108 ns +- 0 ns -> [/home/inada-n/work/python/cpython/python] 110 ns +- 1 ns: 1.01x slower
(+1%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2,3:3,4:4}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 134 ns +- 1 ns
/home/inada-n/work/python/cpython/python: ..................... 122 ns +- 1 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 134 ns +- 1 ns -> [/home/inada-n/work/python/cpython/python] 122 ns +- 1 ns: 1.10x faster
(-9%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2,3:3,4:4,5:5}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 168 ns +- 1 ns
/home/inada-n/work/python/cpython/python: ..................... 112 ns +- 1 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 168 ns +- 1 ns -> [/home/inada-n/work/python/cpython/python] 112 ns +- 1 ns: 1.50x faster
(-33%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2,3:3,4:4,5:5,6:6}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 223 ns +- 1 ns
/home/inada-n/work/python/cpython/python: ..................... 150 ns +- 4 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 223 ns +- 1 ns -> [/home/inada-n/work/python/cpython/python] 150 ns +- 4 ns: 1.48x faster
(-33%)

$ ./python -m pyperf timeit --compare-to ~/pyenv/versions/3.10-dev/bin/python '{1:1,2:2,3:3,4:4,5:5,6:6,7:7,8:8}'
/home/inada-n/pyenv/versions/3.10-dev/bin/python: ..................... 273 ns +- 1 ns
/home/inada-n/work/python/cpython/python: ..................... 177 ns +- 1 ns

Mean +- std dev: [/home/inada-n/pyenv/versions/3.10-dev/bin/python] 273 ns +- 1 ns -> [/home/inada-n/work/python/cpython/python] 177 ns +- 1 ns: 1.54x faster
(-35%)
msg379431 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-10-23 12:20
Is this worthwhile?

Statically, BUILD_CONST_KEY_MAP represents 0.14% of all bytecode instructions in the standard library, but only 0.03% of the bytecode instructions in functions.
Which suggests that dynamically only 1 in 3000 instructions executed is BUILD_CONST_KEY_MAP. The implication is that making BUILD_CONST_KEY_MAP more complex is just as likely to slow things down as speed them up.

Do you have any evidence that this will make a measurable difference?
msg379434 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-10-23 13:53
It is difficult to estimate. Real world applications has different code-style than stdlib. And pyperformance didn't include whole real world applications too.
Some application may use dict display in heavy loop. But it is difficult to say how much exactly.

One example from pyperformance. It uses inner function and function annotations heavily.
BUILD_CONST_KEY_MAP is used to build annotation dict, instead of dict display.
https://github.com/tornadoweb/tornado/blob/5eb7bb8efb4e1be8a2cec1744c857a8dadd43af9/tornado/gen.py

```
591          32 LOAD_CONST               1 ('Future')
             34 LOAD_CONST               2 ('None')
             36 LOAD_CONST               3 (('future', 'return'))
             38 BUILD_CONST_KEY_MAP      5
             40 LOAD_CLOSURE             3 (quiet_exceptions)
             42 BUILD_TUPLE              1
             44 LOAD_CONST               4 (<code object error_callback at 0x7fa7c64195b0, file "/home/inada-n/local/python-dev/lib/python3.10/site-packages/t
ornado/gen.py", line 591>)
             46 LOAD_CONST               5 ('with_timeout.<locals>.error_callback')
             48 MAKE_FUNCTION           12 (annotations, closure)
             50 STORE_DEREF              0 (error_callback)
```

I have not posted pyperformance result because this pull request is based on #41835 and I don't measure each performance.
But #41835 + this is 16% faster than master on bm_tornado_http.
msg379437 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2020-10-23 14:08
It sounds like the root cause is that annotations are repeatedly being computed. We should address the underlying issue, IMO.
msg380215 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-11-02 12:04
OK. Microbenchmark don't justify adding complexity to the eval loop and the compiler.
History
Date User Action Args
2020-11-02 12:04:38methanesetstatus: open -> closed
resolution: rejected
messages: + msg380215

stage: patch review -> resolved
2020-10-23 14:08:29Mark.Shannonsetmessages: + msg379437
2020-10-23 13:53:46methanesetmessages: + msg379434
2020-10-23 12:20:41Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg379431
2020-10-23 11:46:06methanesetmessages: + msg379430
2020-10-23 09:05:06methanesetkeywords: + patch
stage: patch review
pull_requests: + pull_request21842
2020-10-23 08:46:51methanecreate