Issue42366
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-11-16 08:14 by malin, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
Ob3.diff | malin, 2020-11-16 08:14 | |||
pgo_ob3.diff | malin, 2020-11-18 06:11 |
Messages (6) | |||
---|---|---|---|
msg381076 - (view) | Author: Ma Lin (malin) * | Date: 2020-11-16 08:14 | |
MSVC2019 has a new option `/Ob3`, it specifies more aggressive inlining than /Ob2: https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-160 If use this option in MSVC2017, it will emit a warning: cl : Command line warning D9002 : ignoring unknown option '/Ob3' Just apply `Ob3.diff`, get this improvement: (Python 3.9 branch, No PGO, build.bat -p X64) +-------------------------+----------+------------------------------+ | Benchmark | baseline | ob3 | +=========================+==========+==============================+ | 2to3 | 563 ms | 552 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | chameleon | 16.5 ms | 16.1 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | chaos | 200 ms | 197 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | crypto_pyaes | 186 ms | 184 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ | deltablue | 13.0 ms | 12.6 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | dulwich_log | 94.5 ms | 93.9 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ | fannkuch | 806 ms | 761 ms: 1.06x faster (-6%) | +-------------------------+----------+------------------------------+ | float | 211 ms | 199 ms: 1.06x faster (-6%) | +-------------------------+----------+------------------------------+ | genshi_text | 48.3 ms | 47.7 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ | go | 446 ms | 437 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | hexiom | 16.6 ms | 15.9 ms: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | json_dumps | 19.9 ms | 19.3 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | json_loads | 45.5 us | 43.9 us: 1.04x faster (-3%) | +-------------------------+----------+------------------------------+ | logging_format | 21.4 us | 20.7 us: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | logging_silent | 343 ns | 319 ns: 1.07x faster (-7%) | +-------------------------+----------+------------------------------+ | mako | 29.0 ms | 27.6 ms: 1.05x faster (-5%) | +-------------------------+----------+------------------------------+ | meteor_contest | 168 ms | 162 ms: 1.04x faster (-3%) | +-------------------------+----------+------------------------------+ | nbody | 256 ms | 244 ms: 1.05x faster (-5%) | +-------------------------+----------+------------------------------+ | nqueens | 168 ms | 162 ms: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | pathlib | 175 ms | 168 ms: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | pickle | 17.9 us | 17.3 us: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | pickle_dict | 41.0 us | 33.2 us: 1.24x faster (-19%) | +-------------------------+----------+------------------------------+ | pickle_list | 6.73 us | 5.89 us: 1.14x faster (-12%) | +-------------------------+----------+------------------------------+ | pickle_pure_python | 829 us | 793 us: 1.05x faster (-4%) | +-------------------------+----------+------------------------------+ | pidigits | 243 ms | 243 ms: 1.00x faster (-0%) | +-------------------------+----------+------------------------------+ | pyflate | 1.21 sec | 1.18 sec: 1.03x faster (-2%) | +-------------------------+----------+------------------------------+ | raytrace | 947 ms | 915 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | regex_compile | 291 ms | 284 ms: 1.03x faster (-2%) | +-------------------------+----------+------------------------------+ | regex_dna | 217 ms | 222 ms: 1.02x slower (+2%) | +-------------------------+----------+------------------------------+ | regex_effbot | 3.97 ms | 4.13 ms: 1.04x slower (+4%) | +-------------------------+----------+------------------------------+ | regex_v8 | 35.2 ms | 34.6 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | richards | 134 ms | 131 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | scimark_fft | 616 ms | 599 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | scimark_lu | 248 ms | 241 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | scimark_monte_carlo | 187 ms | 179 ms: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | scimark_sor | 361 ms | 343 ms: 1.05x faster (-5%) | +-------------------------+----------+------------------------------+ | scimark_sparse_mat_mult | 7.71 ms | 7.04 ms: 1.10x faster (-9%) | +-------------------------+----------+------------------------------+ | spectral_norm | 249 ms | 245 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | sqlalchemy_declarative | 237 ms | 246 ms: 1.04x slower (+4%) | +-------------------------+----------+------------------------------+ | sqlalchemy_imperative | 40.6 ms | 41.2 ms: 1.02x slower (+2%) | +-------------------------+----------+------------------------------+ | sqlite_synth | 4.64 us | 5.47 us: 1.18x slower (+18%) | +-------------------------+----------+------------------------------+ | sympy_expand | 738 ms | 718 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | sympy_integrate | 35.6 ms | 34.7 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | sympy_sum | 298 ms | 295 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ | sympy_str | 484 ms | 471 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | telco | 11.3 ms | 9.76 ms: 1.16x faster (-14%) | +-------------------------+----------+------------------------------+ | tornado_http | 256 ms | 254 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ | unpack_sequence | 94.3 ns | 90.5 ns: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | unpickle | 23.6 us | 22.6 us: 1.05x faster (-5%) | +-------------------------+----------+------------------------------+ | unpickle_list | 6.63 us | 6.17 us: 1.07x faster (-7%) | +-------------------------+----------+------------------------------+ | unpickle_pure_python | 589 us | 560 us: 1.05x faster (-5%) | +-------------------------+----------+------------------------------+ | xml_etree_parse | 213 ms | 209 ms: 1.02x faster (-2%) | +-------------------------+----------+------------------------------+ | xml_etree_iterparse | 155 ms | 149 ms: 1.04x faster (-4%) | +-------------------------+----------+------------------------------+ | xml_etree_generate | 149 ms | 145 ms: 1.03x faster (-3%) | +-------------------------+----------+------------------------------+ | xml_etree_process | 117 ms | 115 ms: 1.01x faster (-1%) | +-------------------------+----------+------------------------------+ Not significant (5): django_template; genshi_xml; logging_simple; python_startup; python_startup_no_site |
|||
msg381077 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2020-11-16 08:17 | |
Could you please try again with PGO? All our official builds use PGO. |
|||
msg381078 - (view) | Author: Ma Lin (malin) * | Date: 2020-11-16 08:28 | |
> Could you please try again with PGO? Please wait. BTW, this option was advised in another project. In that project, even enable `\Ob3`, it still slower than GCC 9 build. If you are interested, see: https://github.com/facebook/zstd/issues/2314 |
|||
msg381085 - (view) | Author: Ma Lin (malin) * | Date: 2020-11-16 10:26 | |
In PGO build, the improvement is not much. (3.9 branch, with PGO, build.bat -p X64 --pgo) +-------------------------+--------------+------------------------------+ | Benchmark | baseline-pgo | ob3-pgo | +=========================+==============+==============================+ | 2to3 | 464 ms | 462 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | chameleon | 14.0 ms | 13.5 ms: 1.03x faster (-3%) | +-------------------------+--------------+------------------------------+ | crypto_pyaes | 142 ms | 143 ms: 1.00x slower (+0%) | +-------------------------+--------------+------------------------------+ | django_template | 65.0 ms | 65.4 ms: 1.01x slower (+1%) | +-------------------------+--------------+------------------------------+ | fannkuch | 665 ms | 650 ms: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | float | 166 ms | 164 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | genshi_text | 41.4 ms | 41.0 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | genshi_xml | 88.1 ms | 87.0 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | go | 315 ms | 311 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | hexiom | 12.7 ms | 12.6 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | json_dumps | 16.7 ms | 16.6 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | json_loads | 33.5 us | 32.1 us: 1.04x faster (-4%) | +-------------------------+--------------+------------------------------+ | logging_simple | 13.6 us | 13.3 us: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | mako | 22.7 ms | 22.8 ms: 1.01x slower (+1%) | +-------------------------+--------------+------------------------------+ | meteor_contest | 136 ms | 138 ms: 1.01x slower (+1%) | +-------------------------+--------------+------------------------------+ | nbody | 189 ms | 186 ms: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | nqueens | 135 ms | 135 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | pathlib | 157 ms | 154 ms: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | pickle | 16.8 us | 16.4 us: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | pickle_dict | 41.3 us | 40.4 us: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | pickle_list | 6.34 us | 6.42 us: 1.01x slower (+1%) | +-------------------------+--------------+------------------------------+ | pickle_pure_python | 588 us | 584 us: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | pidigits | 242 ms | 242 ms: 1.00x faster (-0%) | +-------------------------+--------------+------------------------------+ | pyflate | 905 ms | 898 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | python_startup | 28.0 ms | 27.9 ms: 1.00x faster (-0%) | +-------------------------+--------------+------------------------------+ | regex_compile | 220 ms | 218 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | regex_v8 | 33.1 ms | 32.9 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | richards | 88.9 ms | 88.3 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | scimark_fft | 494 ms | 486 ms: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | scimark_lu | 210 ms | 207 ms: 1.02x faster (-2%) | +-------------------------+--------------+------------------------------+ | scimark_monte_carlo | 141 ms | 137 ms: 1.03x faster (-3%) | +-------------------------+--------------+------------------------------+ | scimark_sor | 263 ms | 255 ms: 1.03x faster (-3%) | +-------------------------+--------------+------------------------------+ | scimark_sparse_mat_mult | 6.48 ms | 6.10 ms: 1.06x faster (-6%) | +-------------------------+--------------+------------------------------+ | spectral_norm | 200 ms | 184 ms: 1.09x faster (-8%) | +-------------------------+--------------+------------------------------+ | sqlalchemy_imperative | 39.4 ms | 37.8 ms: 1.04x faster (-4%) | +-------------------------+--------------+------------------------------+ | sqlite_synth | 4.24 us | 4.31 us: 1.02x slower (+2%) | +-------------------------+--------------+------------------------------+ | sympy_sum | 266 ms | 270 ms: 1.01x slower (+1%) | +-------------------------+--------------+------------------------------+ | sympy_str | 416 ms | 418 ms: 1.00x slower (+0%) | +-------------------------+--------------+------------------------------+ | telco | 8.12 ms | 8.28 ms: 1.02x slower (+2%) | +-------------------------+--------------+------------------------------+ | unpack_sequence | 92.3 ns | 80.8 ns: 1.14x faster (-13%) | +-------------------------+--------------+------------------------------+ | unpickle | 17.9 us | 18.3 us: 1.02x slower (+2%) | +-------------------------+--------------+------------------------------+ | unpickle_list | 6.43 us | 6.57 us: 1.02x slower (+2%) | +-------------------------+--------------+------------------------------+ | unpickle_pure_python | 419 us | 414 us: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | xml_etree_parse | 184 ms | 183 ms: 1.00x faster (-0%) | +-------------------------+--------------+------------------------------+ | xml_etree_iterparse | 135 ms | 134 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | xml_etree_generate | 130 ms | 129 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ | xml_etree_process | 101 ms | 99.4 ms: 1.01x faster (-1%) | +-------------------------+--------------+------------------------------+ Not significant (13): chaos; deltablue; dulwich_log; logging_format; logging_silent; python_startup_no_site; raytrace; regex_dna; regex_effbot; sqlalchemy_declarative; sympy_expand; sympy_integrate; tornado_http |
|||
msg381315 - (view) | Author: Ma Lin (malin) * | Date: 2020-11-18 06:11 | |
Last benchmark was wrong, \Ob3 option was not enabled. Apply `pgo_ob3.diff`, it slows, so I close this issue. +-------------------------+------------+------------------------------+ | Benchmark | py39_pgo_a | py39_pgo_b | +=========================+============+==============================+ | 2to3 | 461 ms | 465 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | chameleon | 13.4 ms | 13.7 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | chaos | 138 ms | 141 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | crypto_pyaes | 141 ms | 143 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | deltablue | 9.01 ms | 9.20 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | django_template | 64.7 ms | 65.4 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | dulwich_log | 78.2 ms | 78.8 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | fannkuch | 640 ms | 668 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | float | 165 ms | 163 ms: 1.01x faster (-1%) | +-------------------------+------------+------------------------------+ | genshi_text | 40.7 ms | 41.5 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | genshi_xml | 87.2 ms | 88.4 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | go | 309 ms | 314 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | hexiom | 12.3 ms | 12.7 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | json_dumps | 16.7 ms | 16.8 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | json_loads | 32.1 us | 32.5 us: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | logging_format | 14.6 us | 15.0 us: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | logging_silent | 247 ns | 257 ns: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | logging_simple | 13.2 us | 13.6 us: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | mako | 22.1 ms | 22.8 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | meteor_contest | 135 ms | 137 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | nbody | 184 ms | 191 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | nqueens | 132 ms | 137 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | pathlib | 156 ms | 162 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | pickle | 16.3 us | 15.4 us: 1.05x faster (-5%) | +-------------------------+------------+------------------------------+ | pickle_dict | 39.7 us | 40.0 us: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | pickle_list | 5.93 us | 6.15 us: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | pickle_pure_python | 581 us | 587 us: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | pidigits | 243 ms | 242 ms: 1.00x faster (-0%) | +-------------------------+------------+------------------------------+ | pyflate | 885 ms | 908 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | python_startup | 27.8 ms | 28.0 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | python_startup_no_site | 22.0 ms | 22.1 ms: 1.00x slower (+0%) | +-------------------------+------------+------------------------------+ | raytrace | 630 ms | 632 ms: 1.00x slower (+0%) | +-------------------------+------------+------------------------------+ | regex_compile | 215 ms | 220 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | regex_dna | 223 ms | 225 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | regex_v8 | 32.5 ms | 33.4 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | richards | 87.6 ms | 88.5 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | scimark_fft | 484 ms | 501 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | scimark_lu | 205 ms | 210 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | scimark_monte_carlo | 137 ms | 140 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | scimark_sor | 251 ms | 261 ms: 1.04x slower (+4%) | +-------------------------+------------+------------------------------+ | scimark_sparse_mat_mult | 6.07 ms | 6.27 ms: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | spectral_norm | 185 ms | 190 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | sqlalchemy_imperative | 38.8 ms | 37.9 ms: 1.03x faster (-3%) | +-------------------------+------------+------------------------------+ | sqlite_synth | 4.28 us | 4.20 us: 1.02x faster (-2%) | +-------------------------+------------+------------------------------+ | sympy_integrate | 30.4 ms | 30.7 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | sympy_sum | 270 ms | 269 ms: 1.00x faster (-0%) | +-------------------------+------------+------------------------------+ | sympy_str | 416 ms | 419 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | telco | 8.05 ms | 8.09 ms: 1.00x slower (+0%) | +-------------------------+------------+------------------------------+ | unpack_sequence | 79.8 ns | 94.4 ns: 1.18x slower (+18%) | +-------------------------+------------+------------------------------+ | unpickle | 18.2 us | 17.9 us: 1.02x faster (-2%) | +-------------------------+------------+------------------------------+ | unpickle_list | 6.57 us | 6.39 us: 1.03x faster (-3%) | +-------------------------+------------+------------------------------+ | unpickle_pure_python | 407 us | 418 us: 1.03x slower (+3%) | +-------------------------+------------+------------------------------+ | xml_etree_iterparse | 134 ms | 135 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ | xml_etree_generate | 126 ms | 128 ms: 1.02x slower (+2%) | +-------------------------+------------+------------------------------+ | xml_etree_process | 98.7 ms | 99.4 ms: 1.01x slower (+1%) | +-------------------------+------------+------------------------------+ Not significant (5): regex_effbot; sqlalchemy_declarative; sympy_expand; tornado_http; xml_etree_parse |
|||
msg381316 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2020-11-18 06:50 | |
Thank you for your thorough testing. It's useful to know that the option does not speed up PGO builds of Python. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:38 | admin | set | github: 86532 |
2020-11-18 06:50:34 | christian.heimes | set | messages: + msg381316 |
2020-11-18 06:11:22 | malin | set | status: open -> closed files: + pgo_ob3.diff messages: + msg381315 resolution: rejected stage: resolved |
2020-11-16 10:26:22 | malin | set | messages: + msg381085 |
2020-11-16 08:28:30 | malin | set | messages: + msg381078 |
2020-11-16 08:17:22 | christian.heimes | set | nosy:
+ christian.heimes messages: + msg381077 |
2020-11-16 08:14:55 | malin | create |