Issue 46551: Provide number of workers option for fast PGO build time

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/90709

classification

Title:	Provide number of workers option for fast PGO build time
Type:	enhancement	Stage:	resolved
Components:	Build	Versions:	Python 3.11

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:		Nosy List:	corona10, gvanrossum, vstinner
Priority:	normal	Keywords:

Created on 2022-01-27 15:09 by corona10, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (3)
msg411888 - (view)	Author: Dong-hee Na (corona10) *	Date: 2022-01-27 15:09
Compiling CPython with the PGO option is good for CPython performance but compile time is very painful since PGO profiling is executed with a single thread. When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time. so I would like to provide the option for the number of workers for PGO build. and also with this feature, we can include more PGO tests more aggressively. @vstinner, Do you have any suggestions for this option? - a: ./configure --enable-optimizations --pgo-workers=8 - b: ./configure --enable-optimizations --with-concurrent-pgo - c: ./configure --enable-optimizations (By detecting system cpu count) Following metrics is the reference for decision making :) ## Build Time AS-IS: real 4m42.799s TO-BE(this case -j8): real 2m10.405s ## No performance regression I didn't check how the environment is reliable but there looks no regression. +------------------------+---------+-----------------------+ \| Benchmark \| base \| workers \| +========================+=========+=======================+ \| 2to3 \| 409 ms \| 412 ms: 1.01x slower \| +------------------------+---------+-----------------------+ \| chaos \| 115 ms \| 114 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| deltablue \| 6.66 ms \| 6.59 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| fannkuch \| 605 ms \| 611 ms: 1.01x slower \| +------------------------+---------+-----------------------+ \| float \| 138 ms \| 129 ms: 1.07x faster \| +------------------------+---------+-----------------------+ \| go \| 220 ms \| 215 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| hexiom \| 10.3 ms \| 10.1 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| json_dumps \| 19.6 ms \| 19.2 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| json_loads \| 40.6 us \| 39.7 us: 1.02x faster \| +------------------------+---------+-----------------------+ \| logging_silent \| 180 ns \| 173 ns: 1.04x faster \| +------------------------+---------+-----------------------+ \| logging_simple \| 8.89 us \| 8.81 us: 1.01x faster \| +------------------------+---------+-----------------------+ \| nqueens \| 134 ms \| 136 ms: 1.01x slower \| +------------------------+---------+-----------------------+ \| pathlib \| 24.6 ms \| 24.2 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| pickle \| 16.1 us \| 15.9 us: 1.01x faster \| +------------------------+---------+-----------------------+ \| pickle_dict \| 41.4 us \| 38.1 us: 1.09x faster \| +------------------------+---------+-----------------------+ \| pickle_list \| 6.27 us \| 5.09 us: 1.23x faster \| +------------------------+---------+-----------------------+ \| pickle_pure_python \| 499 us \| 492 us: 1.01x faster \| +------------------------+---------+-----------------------+ \| pidigits \| 285 ms \| 290 ms: 1.02x slower \| +------------------------+---------+-----------------------+ \| python_startup \| 12.1 ms \| 12.2 ms: 1.01x slower \| +------------------------+---------+-----------------------+ \| python_startup_no_site \| 8.91 ms \| 8.89 ms: 1.00x faster \| +------------------------+---------+-----------------------+ \| raytrace \| 510 ms \| 500 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| regex_compile \| 211 ms \| 210 ms: 1.00x faster \| +------------------------+---------+-----------------------+ \| regex_effbot \| 4.99 ms \| 4.88 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| regex_v8 \| 37.3 ms \| 36.3 ms: 1.03x faster \| +------------------------+---------+-----------------------+ \| richards \| 73.6 ms \| 72.2 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| scimark_fft \| 542 ms \| 552 ms: 1.02x slower \| +------------------------+---------+-----------------------+ \| scimark_lu \| 189 ms \| 184 ms: 1.03x faster \| +------------------------+---------+-----------------------+ \| scimark_monte_carlo \| 106 ms \| 106 ms: 1.01x slower \| +------------------------+---------+-----------------------+ \| scimark_sor \| 199 ms \| 196 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| spectral_norm \| 177 ms \| 176 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| unpack_sequence \| 64.9 ns \| 63.7 ns: 1.02x faster \| +------------------------+---------+-----------------------+ \| unpickle \| 21.5 us \| 21.6 us: 1.00x slower \| +------------------------+---------+-----------------------+ \| unpickle_list \| 7.69 us \| 7.55 us: 1.02x faster \| +------------------------+---------+-----------------------+ \| unpickle_pure_python \| 402 us \| 394 us: 1.02x faster \| +------------------------+---------+-----------------------+ \| xml_etree_parse \| 218 ms \| 217 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| xml_etree_iterparse \| 156 ms \| 156 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| xml_etree_generate \| 132 ms \| 131 ms: 1.01x faster \| +------------------------+---------+-----------------------+ \| xml_etree_process \| 92.8 ms \| 91.5 ms: 1.02x faster \| +------------------------+---------+-----------------------+ \| Geometric mean \| (ref) \| 1.02x faster \| +------------------------+---------+-----------------------+ Benchmark hidden because not significant (8): logging_format, meteor_contest, nbody, pyflate, regex_dna, scimark_sparse_mat_mult, sqlite_synth, telco
msg411892 - (view)	Author: STINNER Victor (vstinner) *	Date: 2022-01-27 15:51
> When I tested with run -m test --pgo -j8, it doesn't affect to optimized result with fast build time. You only test libregrtest.main and libregrtest.runtest_mp modules which don't execute code. Does it mean that running tests is useless to train the PGO? Or does PGO magically aggregates results when multiple processes are run?
msg411898 - (view)	Author: Dong-hee Na (corona10) *	Date: 2022-01-27 16:59
> You only test libregrtest.main and libregrtest.runtest_mp modules which > don't execute code. Does it mean that running tests is useless to train > the PGO? Sorry, I didn't check all affects except performance regression, and there was already related discussion and decided not to do: https://bugs.python.org/issue24915#msg251128 'I think the --pgo flag needs only work in single process mode, since multi-process would probably not write out the profiling data properly.' I close the issue as won't do.

History
Date	User	Action	Args
2022-04-11 14:59:55	admin	set	github: 90709
2022-01-27 16:59:18	corona10	set	status: open -> closed messages: + msg411898 assignee: corona10 -> resolution: rejected stage: resolved
2022-01-27 15:51:52	vstinner	set	messages: + msg411892
2022-01-27 15:09:28	corona10	create