classification
Title: Use MSVC2019 and /Ob3 option to compile Windows builds
Type: performance Stage: resolved
Components: Windows Versions: Python 3.10
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, malin, paul.moore, steve.dower, tim.golden, zach.ware
Priority: normal Keywords: patch

Created on 2020-11-16 08:14 by malin, last changed 2020-11-18 06:50 by christian.heimes. This issue is now closed.

Files
File name Uploaded Description Edit
Ob3.diff malin, 2020-11-16 08:14
pgo_ob3.diff malin, 2020-11-18 06:11
Messages (6)
msg381076 - (view) Author: Ma Lin (malin) * Date: 2020-11-16 08:14
MSVC2019 has a new option `/Ob3`, it specifies more aggressive inlining than /Ob2:
https://docs.microsoft.com/en-us/cpp/build/reference/ob-inline-function-expansion?view=msvc-160

If use this option in MSVC2017, it will emit a warning:
cl : Command line warning D9002 : ignoring unknown option '/Ob3'

Just apply `Ob3.diff`, get this improvement:
(Python 3.9 branch, No PGO, build.bat -p X64)

+-------------------------+----------+------------------------------+
| Benchmark               | baseline | ob3                          |
+=========================+==========+==============================+
| 2to3                    | 563 ms   | 552 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| chameleon               | 16.5 ms  | 16.1 ms: 1.03x faster (-3%)  |
+-------------------------+----------+------------------------------+
| chaos                   | 200 ms   | 197 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| crypto_pyaes            | 186 ms   | 184 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| deltablue               | 13.0 ms  | 12.6 ms: 1.03x faster (-3%)  |
+-------------------------+----------+------------------------------+
| dulwich_log             | 94.5 ms  | 93.9 ms: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| fannkuch                | 806 ms   | 761 ms: 1.06x faster (-6%)   |
+-------------------------+----------+------------------------------+
| float                   | 211 ms   | 199 ms: 1.06x faster (-6%)   |
+-------------------------+----------+------------------------------+
| genshi_text             | 48.3 ms  | 47.7 ms: 1.01x faster (-1%)  |
+-------------------------+----------+------------------------------+
| go                      | 446 ms   | 437 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| hexiom                  | 16.6 ms  | 15.9 ms: 1.04x faster (-4%)  |
+-------------------------+----------+------------------------------+
| json_dumps              | 19.9 ms  | 19.3 ms: 1.03x faster (-3%)  |
+-------------------------+----------+------------------------------+
| json_loads              | 45.5 us  | 43.9 us: 1.04x faster (-3%)  |
+-------------------------+----------+------------------------------+
| logging_format          | 21.4 us  | 20.7 us: 1.03x faster (-3%)  |
+-------------------------+----------+------------------------------+
| logging_silent          | 343 ns   | 319 ns: 1.07x faster (-7%)   |
+-------------------------+----------+------------------------------+
| mako                    | 29.0 ms  | 27.6 ms: 1.05x faster (-5%)  |
+-------------------------+----------+------------------------------+
| meteor_contest          | 168 ms   | 162 ms: 1.04x faster (-3%)   |
+-------------------------+----------+------------------------------+
| nbody                   | 256 ms   | 244 ms: 1.05x faster (-5%)   |
+-------------------------+----------+------------------------------+
| nqueens                 | 168 ms   | 162 ms: 1.04x faster (-4%)   |
+-------------------------+----------+------------------------------+
| pathlib                 | 175 ms   | 168 ms: 1.04x faster (-4%)   |
+-------------------------+----------+------------------------------+
| pickle                  | 17.9 us  | 17.3 us: 1.04x faster (-4%)  |
+-------------------------+----------+------------------------------+
| pickle_dict             | 41.0 us  | 33.2 us: 1.24x faster (-19%) |
+-------------------------+----------+------------------------------+
| pickle_list             | 6.73 us  | 5.89 us: 1.14x faster (-12%) |
+-------------------------+----------+------------------------------+
| pickle_pure_python      | 829 us   | 793 us: 1.05x faster (-4%)   |
+-------------------------+----------+------------------------------+
| pidigits                | 243 ms   | 243 ms: 1.00x faster (-0%)   |
+-------------------------+----------+------------------------------+
| pyflate                 | 1.21 sec | 1.18 sec: 1.03x faster (-2%) |
+-------------------------+----------+------------------------------+
| raytrace                | 947 ms   | 915 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| regex_compile           | 291 ms   | 284 ms: 1.03x faster (-2%)   |
+-------------------------+----------+------------------------------+
| regex_dna               | 217 ms   | 222 ms: 1.02x slower (+2%)   |
+-------------------------+----------+------------------------------+
| regex_effbot            | 3.97 ms  | 4.13 ms: 1.04x slower (+4%)  |
+-------------------------+----------+------------------------------+
| regex_v8                | 35.2 ms  | 34.6 ms: 1.02x faster (-2%)  |
+-------------------------+----------+------------------------------+
| richards                | 134 ms   | 131 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| scimark_fft             | 616 ms   | 599 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| scimark_lu              | 248 ms   | 241 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| scimark_monte_carlo     | 187 ms   | 179 ms: 1.04x faster (-4%)   |
+-------------------------+----------+------------------------------+
| scimark_sor             | 361 ms   | 343 ms: 1.05x faster (-5%)   |
+-------------------------+----------+------------------------------+
| scimark_sparse_mat_mult | 7.71 ms  | 7.04 ms: 1.10x faster (-9%)  |
+-------------------------+----------+------------------------------+
| spectral_norm           | 249 ms   | 245 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| sqlalchemy_declarative  | 237 ms   | 246 ms: 1.04x slower (+4%)   |
+-------------------------+----------+------------------------------+
| sqlalchemy_imperative   | 40.6 ms  | 41.2 ms: 1.02x slower (+2%)  |
+-------------------------+----------+------------------------------+
| sqlite_synth            | 4.64 us  | 5.47 us: 1.18x slower (+18%) |
+-------------------------+----------+------------------------------+
| sympy_expand            | 738 ms   | 718 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| sympy_integrate         | 35.6 ms  | 34.7 ms: 1.03x faster (-3%)  |
+-------------------------+----------+------------------------------+
| sympy_sum               | 298 ms   | 295 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| sympy_str               | 484 ms   | 471 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| telco                   | 11.3 ms  | 9.76 ms: 1.16x faster (-14%) |
+-------------------------+----------+------------------------------+
| tornado_http            | 256 ms   | 254 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+
| unpack_sequence         | 94.3 ns  | 90.5 ns: 1.04x faster (-4%)  |
+-------------------------+----------+------------------------------+
| unpickle                | 23.6 us  | 22.6 us: 1.05x faster (-5%)  |
+-------------------------+----------+------------------------------+
| unpickle_list           | 6.63 us  | 6.17 us: 1.07x faster (-7%)  |
+-------------------------+----------+------------------------------+
| unpickle_pure_python    | 589 us   | 560 us: 1.05x faster (-5%)   |
+-------------------------+----------+------------------------------+
| xml_etree_parse         | 213 ms   | 209 ms: 1.02x faster (-2%)   |
+-------------------------+----------+------------------------------+
| xml_etree_iterparse     | 155 ms   | 149 ms: 1.04x faster (-4%)   |
+-------------------------+----------+------------------------------+
| xml_etree_generate      | 149 ms   | 145 ms: 1.03x faster (-3%)   |
+-------------------------+----------+------------------------------+
| xml_etree_process       | 117 ms   | 115 ms: 1.01x faster (-1%)   |
+-------------------------+----------+------------------------------+

Not significant (5): django_template; genshi_xml; logging_simple; python_startup; python_startup_no_site
msg381077 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-11-16 08:17
Could you please try again with PGO? All our official builds use PGO.
msg381078 - (view) Author: Ma Lin (malin) * Date: 2020-11-16 08:28
> Could you please try again with PGO?

Please wait.

BTW, this option was advised in another project.
In that project, even enable `\Ob3`, it still slower than GCC 9 build.
If you are interested, see: https://github.com/facebook/zstd/issues/2314
msg381085 - (view) Author: Ma Lin (malin) * Date: 2020-11-16 10:26
In PGO build, the improvement is not much.

(3.9 branch, with PGO, build.bat -p X64 --pgo)

+-------------------------+--------------+------------------------------+
| Benchmark               | baseline-pgo | ob3-pgo                      |
+=========================+==============+==============================+
| 2to3                    | 464 ms       | 462 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| chameleon               | 14.0 ms      | 13.5 ms: 1.03x faster (-3%)  |
+-------------------------+--------------+------------------------------+
| crypto_pyaes            | 142 ms       | 143 ms: 1.00x slower (+0%)   |
+-------------------------+--------------+------------------------------+
| django_template         | 65.0 ms      | 65.4 ms: 1.01x slower (+1%)  |
+-------------------------+--------------+------------------------------+
| fannkuch                | 665 ms       | 650 ms: 1.02x faster (-2%)   |
+-------------------------+--------------+------------------------------+
| float                   | 166 ms       | 164 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| genshi_text             | 41.4 ms      | 41.0 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| genshi_xml              | 88.1 ms      | 87.0 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| go                      | 315 ms       | 311 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| hexiom                  | 12.7 ms      | 12.6 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| json_dumps              | 16.7 ms      | 16.6 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| json_loads              | 33.5 us      | 32.1 us: 1.04x faster (-4%)  |
+-------------------------+--------------+------------------------------+
| logging_simple          | 13.6 us      | 13.3 us: 1.02x faster (-2%)  |
+-------------------------+--------------+------------------------------+
| mako                    | 22.7 ms      | 22.8 ms: 1.01x slower (+1%)  |
+-------------------------+--------------+------------------------------+
| meteor_contest          | 136 ms       | 138 ms: 1.01x slower (+1%)   |
+-------------------------+--------------+------------------------------+
| nbody                   | 189 ms       | 186 ms: 1.02x faster (-2%)   |
+-------------------------+--------------+------------------------------+
| nqueens                 | 135 ms       | 135 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| pathlib                 | 157 ms       | 154 ms: 1.02x faster (-2%)   |
+-------------------------+--------------+------------------------------+
| pickle                  | 16.8 us      | 16.4 us: 1.02x faster (-2%)  |
+-------------------------+--------------+------------------------------+
| pickle_dict             | 41.3 us      | 40.4 us: 1.02x faster (-2%)  |
+-------------------------+--------------+------------------------------+
| pickle_list             | 6.34 us      | 6.42 us: 1.01x slower (+1%)  |
+-------------------------+--------------+------------------------------+
| pickle_pure_python      | 588 us       | 584 us: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| pidigits                | 242 ms       | 242 ms: 1.00x faster (-0%)   |
+-------------------------+--------------+------------------------------+
| pyflate                 | 905 ms       | 898 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| python_startup          | 28.0 ms      | 27.9 ms: 1.00x faster (-0%)  |
+-------------------------+--------------+------------------------------+
| regex_compile           | 220 ms       | 218 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| regex_v8                | 33.1 ms      | 32.9 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| richards                | 88.9 ms      | 88.3 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+
| scimark_fft             | 494 ms       | 486 ms: 1.02x faster (-2%)   |
+-------------------------+--------------+------------------------------+
| scimark_lu              | 210 ms       | 207 ms: 1.02x faster (-2%)   |
+-------------------------+--------------+------------------------------+
| scimark_monte_carlo     | 141 ms       | 137 ms: 1.03x faster (-3%)   |
+-------------------------+--------------+------------------------------+
| scimark_sor             | 263 ms       | 255 ms: 1.03x faster (-3%)   |
+-------------------------+--------------+------------------------------+
| scimark_sparse_mat_mult | 6.48 ms      | 6.10 ms: 1.06x faster (-6%)  |
+-------------------------+--------------+------------------------------+
| spectral_norm           | 200 ms       | 184 ms: 1.09x faster (-8%)   |
+-------------------------+--------------+------------------------------+
| sqlalchemy_imperative   | 39.4 ms      | 37.8 ms: 1.04x faster (-4%)  |
+-------------------------+--------------+------------------------------+
| sqlite_synth            | 4.24 us      | 4.31 us: 1.02x slower (+2%)  |
+-------------------------+--------------+------------------------------+
| sympy_sum               | 266 ms       | 270 ms: 1.01x slower (+1%)   |
+-------------------------+--------------+------------------------------+
| sympy_str               | 416 ms       | 418 ms: 1.00x slower (+0%)   |
+-------------------------+--------------+------------------------------+
| telco                   | 8.12 ms      | 8.28 ms: 1.02x slower (+2%)  |
+-------------------------+--------------+------------------------------+
| unpack_sequence         | 92.3 ns      | 80.8 ns: 1.14x faster (-13%) |
+-------------------------+--------------+------------------------------+
| unpickle                | 17.9 us      | 18.3 us: 1.02x slower (+2%)  |
+-------------------------+--------------+------------------------------+
| unpickle_list           | 6.43 us      | 6.57 us: 1.02x slower (+2%)  |
+-------------------------+--------------+------------------------------+
| unpickle_pure_python    | 419 us       | 414 us: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| xml_etree_parse         | 184 ms       | 183 ms: 1.00x faster (-0%)   |
+-------------------------+--------------+------------------------------+
| xml_etree_iterparse     | 135 ms       | 134 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| xml_etree_generate      | 130 ms       | 129 ms: 1.01x faster (-1%)   |
+-------------------------+--------------+------------------------------+
| xml_etree_process       | 101 ms       | 99.4 ms: 1.01x faster (-1%)  |
+-------------------------+--------------+------------------------------+

Not significant (13): chaos; deltablue; dulwich_log; logging_format; logging_silent; python_startup_no_site; raytrace; regex_dna; regex_effbot; sqlalchemy_declarative; sympy_expand; sympy_integrate; tornado_http
msg381315 - (view) Author: Ma Lin (malin) * Date: 2020-11-18 06:11
Last benchmark was wrong, \Ob3 option was not enabled.

Apply `pgo_ob3.diff`, it slows, so I close this issue.

+-------------------------+------------+------------------------------+
| Benchmark               | py39_pgo_a | py39_pgo_b                   |
+=========================+============+==============================+
| 2to3                    | 461 ms     | 465 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| chameleon               | 13.4 ms    | 13.7 ms: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| chaos                   | 138 ms     | 141 ms: 1.02x slower (+2%)   |
+-------------------------+------------+------------------------------+
| crypto_pyaes            | 141 ms     | 143 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| deltablue               | 9.01 ms    | 9.20 ms: 1.02x slower (+2%)  |
+-------------------------+------------+------------------------------+
| django_template         | 64.7 ms    | 65.4 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| dulwich_log             | 78.2 ms    | 78.8 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| fannkuch                | 640 ms     | 668 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| float                   | 165 ms     | 163 ms: 1.01x faster (-1%)   |
+-------------------------+------------+------------------------------+
| genshi_text             | 40.7 ms    | 41.5 ms: 1.02x slower (+2%)  |
+-------------------------+------------+------------------------------+
| genshi_xml              | 87.2 ms    | 88.4 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| go                      | 309 ms     | 314 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| hexiom                  | 12.3 ms    | 12.7 ms: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| json_dumps              | 16.7 ms    | 16.8 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| json_loads              | 32.1 us    | 32.5 us: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| logging_format          | 14.6 us    | 15.0 us: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| logging_silent          | 247 ns     | 257 ns: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| logging_simple          | 13.2 us    | 13.6 us: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| mako                    | 22.1 ms    | 22.8 ms: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| meteor_contest          | 135 ms     | 137 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| nbody                   | 184 ms     | 191 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| nqueens                 | 132 ms     | 137 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| pathlib                 | 156 ms     | 162 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| pickle                  | 16.3 us    | 15.4 us: 1.05x faster (-5%)  |
+-------------------------+------------+------------------------------+
| pickle_dict             | 39.7 us    | 40.0 us: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| pickle_list             | 5.93 us    | 6.15 us: 1.04x slower (+4%)  |
+-------------------------+------------+------------------------------+
| pickle_pure_python      | 581 us     | 587 us: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| pidigits                | 243 ms     | 242 ms: 1.00x faster (-0%)   |
+-------------------------+------------+------------------------------+
| pyflate                 | 885 ms     | 908 ms: 1.03x slower (+3%)   |
+-------------------------+------------+------------------------------+
| python_startup          | 27.8 ms    | 28.0 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| python_startup_no_site  | 22.0 ms    | 22.1 ms: 1.00x slower (+0%)  |
+-------------------------+------------+------------------------------+
| raytrace                | 630 ms     | 632 ms: 1.00x slower (+0%)   |
+-------------------------+------------+------------------------------+
| regex_compile           | 215 ms     | 220 ms: 1.03x slower (+3%)   |
+-------------------------+------------+------------------------------+
| regex_dna               | 223 ms     | 225 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| regex_v8                | 32.5 ms    | 33.4 ms: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| richards                | 87.6 ms    | 88.5 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| scimark_fft             | 484 ms     | 501 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| scimark_lu              | 205 ms     | 210 ms: 1.02x slower (+2%)   |
+-------------------------+------------+------------------------------+
| scimark_monte_carlo     | 137 ms     | 140 ms: 1.02x slower (+2%)   |
+-------------------------+------------+------------------------------+
| scimark_sor             | 251 ms     | 261 ms: 1.04x slower (+4%)   |
+-------------------------+------------+------------------------------+
| scimark_sparse_mat_mult | 6.07 ms    | 6.27 ms: 1.03x slower (+3%)  |
+-------------------------+------------+------------------------------+
| spectral_norm           | 185 ms     | 190 ms: 1.02x slower (+2%)   |
+-------------------------+------------+------------------------------+
| sqlalchemy_imperative   | 38.8 ms    | 37.9 ms: 1.03x faster (-3%)  |
+-------------------------+------------+------------------------------+
| sqlite_synth            | 4.28 us    | 4.20 us: 1.02x faster (-2%)  |
+-------------------------+------------+------------------------------+
| sympy_integrate         | 30.4 ms    | 30.7 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+
| sympy_sum               | 270 ms     | 269 ms: 1.00x faster (-0%)   |
+-------------------------+------------+------------------------------+
| sympy_str               | 416 ms     | 419 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| telco                   | 8.05 ms    | 8.09 ms: 1.00x slower (+0%)  |
+-------------------------+------------+------------------------------+
| unpack_sequence         | 79.8 ns    | 94.4 ns: 1.18x slower (+18%) |
+-------------------------+------------+------------------------------+
| unpickle                | 18.2 us    | 17.9 us: 1.02x faster (-2%)  |
+-------------------------+------------+------------------------------+
| unpickle_list           | 6.57 us    | 6.39 us: 1.03x faster (-3%)  |
+-------------------------+------------+------------------------------+
| unpickle_pure_python    | 407 us     | 418 us: 1.03x slower (+3%)   |
+-------------------------+------------+------------------------------+
| xml_etree_iterparse     | 134 ms     | 135 ms: 1.01x slower (+1%)   |
+-------------------------+------------+------------------------------+
| xml_etree_generate      | 126 ms     | 128 ms: 1.02x slower (+2%)   |
+-------------------------+------------+------------------------------+
| xml_etree_process       | 98.7 ms    | 99.4 ms: 1.01x slower (+1%)  |
+-------------------------+------------+------------------------------+

Not significant (5): regex_effbot; sqlalchemy_declarative; sympy_expand; tornado_http; xml_etree_parse
msg381316 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2020-11-18 06:50
Thank you for your thorough testing. It's useful to know that the option does not speed up PGO builds of Python.
History
Date User Action Args
2020-11-18 06:50:34christian.heimessetmessages: + msg381316
2020-11-18 06:11:22malinsetstatus: open -> closed
files: + pgo_ob3.diff
messages: + msg381315

resolution: rejected
stage: resolved
2020-11-16 10:26:22malinsetmessages: + msg381085
2020-11-16 08:28:30malinsetmessages: + msg381078
2020-11-16 08:17:22christian.heimessetnosy: + christian.heimes
messages: + msg381077
2020-11-16 08:14:55malincreate