This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Provide number of workers option for fast PGO build time
Type: enhancement Stage: resolved
Components: Build Versions: Python 3.11
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: corona10, gvanrossum, vstinner
Priority: normal Keywords:

Created on 2022-01-27 15:09 by corona10, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg411888 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2022-01-27 15:09
Compiling CPython with the PGO option is good for CPython performance but compile time is very painful since PGO profiling is executed with a single thread.

When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time.

so I would like to provide the option for the number of workers for PGO build. and also with this feature, we can include more PGO tests more aggressively.

@vstinner, Do you have any suggestions for this option?
- a: ./configure --enable-optimizations --pgo-workers=8
- b: ./configure --enable-optimizations --with-concurrent-pgo
- c: ./configure --enable-optimizations (By detecting system cpu count)

Following metrics is the reference for decision making :)

## Build Time
AS-IS:
real    4m42.799s

TO-BE(this case -j8): 
real    2m10.405s

## No performance regression
I didn't check how the environment is reliable but there looks no regression.
+------------------------+---------+-----------------------+
| Benchmark              | base    | workers               |
+========================+=========+=======================+
| 2to3                   | 409 ms  | 412 ms: 1.01x slower  |
+------------------------+---------+-----------------------+
| chaos                  | 115 ms  | 114 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| deltablue              | 6.66 ms | 6.59 ms: 1.01x faster |
+------------------------+---------+-----------------------+
| fannkuch               | 605 ms  | 611 ms: 1.01x slower  |
+------------------------+---------+-----------------------+
| float                  | 138 ms  | 129 ms: 1.07x faster  |
+------------------------+---------+-----------------------+
| go                     | 220 ms  | 215 ms: 1.02x faster  |
+------------------------+---------+-----------------------+
| hexiom                 | 10.3 ms | 10.1 ms: 1.02x faster |
+------------------------+---------+-----------------------+
| json_dumps             | 19.6 ms | 19.2 ms: 1.02x faster |
+------------------------+---------+-----------------------+
| json_loads             | 40.6 us | 39.7 us: 1.02x faster |
+------------------------+---------+-----------------------+
| logging_silent         | 180 ns  | 173 ns: 1.04x faster  |
+------------------------+---------+-----------------------+
| logging_simple         | 8.89 us | 8.81 us: 1.01x faster |
+------------------------+---------+-----------------------+
| nqueens                | 134 ms  | 136 ms: 1.01x slower  |
+------------------------+---------+-----------------------+
| pathlib                | 24.6 ms | 24.2 ms: 1.01x faster |
+------------------------+---------+-----------------------+
| pickle                 | 16.1 us | 15.9 us: 1.01x faster |
+------------------------+---------+-----------------------+
| pickle_dict            | 41.4 us | 38.1 us: 1.09x faster |
+------------------------+---------+-----------------------+
| pickle_list            | 6.27 us | 5.09 us: 1.23x faster |
+------------------------+---------+-----------------------+
| pickle_pure_python     | 499 us  | 492 us: 1.01x faster  |
+------------------------+---------+-----------------------+
| pidigits               | 285 ms  | 290 ms: 1.02x slower  |
+------------------------+---------+-----------------------+
| python_startup         | 12.1 ms | 12.2 ms: 1.01x slower |
+------------------------+---------+-----------------------+
| python_startup_no_site | 8.91 ms | 8.89 ms: 1.00x faster |
+------------------------+---------+-----------------------+
| raytrace               | 510 ms  | 500 ms: 1.02x faster  |
+------------------------+---------+-----------------------+
| regex_compile          | 211 ms  | 210 ms: 1.00x faster  |
+------------------------+---------+-----------------------+
| regex_effbot           | 4.99 ms | 4.88 ms: 1.02x faster |
+------------------------+---------+-----------------------+
| regex_v8               | 37.3 ms | 36.3 ms: 1.03x faster |
+------------------------+---------+-----------------------+
| richards               | 73.6 ms | 72.2 ms: 1.02x faster |
+------------------------+---------+-----------------------+
| scimark_fft            | 542 ms  | 552 ms: 1.02x slower  |
+------------------------+---------+-----------------------+
| scimark_lu             | 189 ms  | 184 ms: 1.03x faster  |
+------------------------+---------+-----------------------+
| scimark_monte_carlo    | 106 ms  | 106 ms: 1.01x slower  |
+------------------------+---------+-----------------------+
| scimark_sor            | 199 ms  | 196 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| spectral_norm          | 177 ms  | 176 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| unpack_sequence        | 64.9 ns | 63.7 ns: 1.02x faster |
+------------------------+---------+-----------------------+
| unpickle               | 21.5 us | 21.6 us: 1.00x slower |
+------------------------+---------+-----------------------+
| unpickle_list          | 7.69 us | 7.55 us: 1.02x faster |
+------------------------+---------+-----------------------+
| unpickle_pure_python   | 402 us  | 394 us: 1.02x faster  |
+------------------------+---------+-----------------------+
| xml_etree_parse        | 218 ms  | 217 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| xml_etree_iterparse    | 156 ms  | 156 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| xml_etree_generate     | 132 ms  | 131 ms: 1.01x faster  |
+------------------------+---------+-----------------------+
| xml_etree_process      | 92.8 ms | 91.5 ms: 1.02x faster |
+------------------------+---------+-----------------------+
| Geometric mean         | (ref)   | 1.02x faster          |
+------------------------+---------+-----------------------+

Benchmark hidden because not significant (8): logging_format, meteor_contest, nbody, pyflate, regex_dna, scimark_sparse_mat_mult, sqlite_synth, telco
msg411892 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-01-27 15:51
> When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time.

You only test libregrtest.main and libregrtest.runtest_mp modules which don't execute code. Does it mean that running tests is useless to train the PGO?

Or does PGO magically aggregates results when multiple processes are run?
msg411898 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2022-01-27 16:59
> You only test libregrtest.main and libregrtest.runtest_mp modules which > don't execute code. Does it mean that running tests is useless to train > the PGO?

Sorry, I didn't check all affects except performance regression,
and there was already related discussion and decided not to do:
https://bugs.python.org/issue24915#msg251128
'I think the --pgo flag needs only work in single process mode, since
multi-process would probably not write out the profiling data properly.'

I close the issue as won't do.
History
Date User Action Args
2022-04-11 14:59:55adminsetgithub: 90709
2022-01-27 16:59:18corona10setstatus: open -> closed
messages: + msg411898

assignee: corona10 ->
resolution: rejected
stage: resolved
2022-01-27 15:51:52vstinnersetmessages: + msg411892
2022-01-27 15:09:28corona10create