Issue46551
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2022-01-27 15:09 by corona10, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (3) | |||
---|---|---|---|
msg411888 - (view) | Author: Dong-hee Na (corona10) * | Date: 2022-01-27 15:09 | |
Compiling CPython with the PGO option is good for CPython performance but compile time is very painful since PGO profiling is executed with a single thread. When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time. so I would like to provide the option for the number of workers for PGO build. and also with this feature, we can include more PGO tests more aggressively. @vstinner, Do you have any suggestions for this option? - a: ./configure --enable-optimizations --pgo-workers=8 - b: ./configure --enable-optimizations --with-concurrent-pgo - c: ./configure --enable-optimizations (By detecting system cpu count) Following metrics is the reference for decision making :) ## Build Time AS-IS: real 4m42.799s TO-BE(this case -j8): real 2m10.405s ## No performance regression I didn't check how the environment is reliable but there looks no regression. +------------------------+---------+-----------------------+ | Benchmark | base | workers | +========================+=========+=======================+ | 2to3 | 409 ms | 412 ms: 1.01x slower | +------------------------+---------+-----------------------+ | chaos | 115 ms | 114 ms: 1.01x faster | +------------------------+---------+-----------------------+ | deltablue | 6.66 ms | 6.59 ms: 1.01x faster | +------------------------+---------+-----------------------+ | fannkuch | 605 ms | 611 ms: 1.01x slower | +------------------------+---------+-----------------------+ | float | 138 ms | 129 ms: 1.07x faster | +------------------------+---------+-----------------------+ | go | 220 ms | 215 ms: 1.02x faster | +------------------------+---------+-----------------------+ | hexiom | 10.3 ms | 10.1 ms: 1.02x faster | +------------------------+---------+-----------------------+ | json_dumps | 19.6 ms | 19.2 ms: 1.02x faster | +------------------------+---------+-----------------------+ | json_loads | 40.6 us | 39.7 us: 1.02x faster | +------------------------+---------+-----------------------+ | logging_silent | 180 ns | 173 ns: 1.04x faster | +------------------------+---------+-----------------------+ | logging_simple | 8.89 us | 8.81 us: 1.01x faster | +------------------------+---------+-----------------------+ | nqueens | 134 ms | 136 ms: 1.01x slower | +------------------------+---------+-----------------------+ | pathlib | 24.6 ms | 24.2 ms: 1.01x faster | +------------------------+---------+-----------------------+ | pickle | 16.1 us | 15.9 us: 1.01x faster | +------------------------+---------+-----------------------+ | pickle_dict | 41.4 us | 38.1 us: 1.09x faster | +------------------------+---------+-----------------------+ | pickle_list | 6.27 us | 5.09 us: 1.23x faster | +------------------------+---------+-----------------------+ | pickle_pure_python | 499 us | 492 us: 1.01x faster | +------------------------+---------+-----------------------+ | pidigits | 285 ms | 290 ms: 1.02x slower | +------------------------+---------+-----------------------+ | python_startup | 12.1 ms | 12.2 ms: 1.01x slower | +------------------------+---------+-----------------------+ | python_startup_no_site | 8.91 ms | 8.89 ms: 1.00x faster | +------------------------+---------+-----------------------+ | raytrace | 510 ms | 500 ms: 1.02x faster | +------------------------+---------+-----------------------+ | regex_compile | 211 ms | 210 ms: 1.00x faster | +------------------------+---------+-----------------------+ | regex_effbot | 4.99 ms | 4.88 ms: 1.02x faster | +------------------------+---------+-----------------------+ | regex_v8 | 37.3 ms | 36.3 ms: 1.03x faster | +------------------------+---------+-----------------------+ | richards | 73.6 ms | 72.2 ms: 1.02x faster | +------------------------+---------+-----------------------+ | scimark_fft | 542 ms | 552 ms: 1.02x slower | +------------------------+---------+-----------------------+ | scimark_lu | 189 ms | 184 ms: 1.03x faster | +------------------------+---------+-----------------------+ | scimark_monte_carlo | 106 ms | 106 ms: 1.01x slower | +------------------------+---------+-----------------------+ | scimark_sor | 199 ms | 196 ms: 1.01x faster | +------------------------+---------+-----------------------+ | spectral_norm | 177 ms | 176 ms: 1.01x faster | +------------------------+---------+-----------------------+ | unpack_sequence | 64.9 ns | 63.7 ns: 1.02x faster | +------------------------+---------+-----------------------+ | unpickle | 21.5 us | 21.6 us: 1.00x slower | +------------------------+---------+-----------------------+ | unpickle_list | 7.69 us | 7.55 us: 1.02x faster | +------------------------+---------+-----------------------+ | unpickle_pure_python | 402 us | 394 us: 1.02x faster | +------------------------+---------+-----------------------+ | xml_etree_parse | 218 ms | 217 ms: 1.01x faster | +------------------------+---------+-----------------------+ | xml_etree_iterparse | 156 ms | 156 ms: 1.01x faster | +------------------------+---------+-----------------------+ | xml_etree_generate | 132 ms | 131 ms: 1.01x faster | +------------------------+---------+-----------------------+ | xml_etree_process | 92.8 ms | 91.5 ms: 1.02x faster | +------------------------+---------+-----------------------+ | Geometric mean | (ref) | 1.02x faster | +------------------------+---------+-----------------------+ Benchmark hidden because not significant (8): logging_format, meteor_contest, nbody, pyflate, regex_dna, scimark_sparse_mat_mult, sqlite_synth, telco |
|||
msg411892 - (view) | Author: STINNER Victor (vstinner) * | Date: 2022-01-27 15:51 | |
> When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time. You only test libregrtest.main and libregrtest.runtest_mp modules which don't execute code. Does it mean that running tests is useless to train the PGO? Or does PGO magically aggregates results when multiple processes are run? |
|||
msg411898 - (view) | Author: Dong-hee Na (corona10) * | Date: 2022-01-27 16:59 | |
> You only test libregrtest.main and libregrtest.runtest_mp modules which > don't execute code. Does it mean that running tests is useless to train > the PGO? Sorry, I didn't check all affects except performance regression, and there was already related discussion and decided not to do: https://bugs.python.org/issue24915#msg251128 'I think the --pgo flag needs only work in single process mode, since multi-process would probably not write out the profiling data properly.' I close the issue as won't do. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:55 | admin | set | github: 90709 |
2022-01-27 16:59:18 | corona10 | set | status: open -> closed messages: + msg411898 assignee: corona10 -> resolution: rejected stage: resolved |
2022-01-27 15:51:52 | vstinner | set | messages: + msg411892 |
2022-01-27 15:09:28 | corona10 | create |