classification
Title: Add support for building cpython with clang thin lto
Type: enhancement Stage: resolved
Components: Build Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, corona10, gregory.p.smith, holmanb, lukasz.langa, ned.deily
Priority: normal Keywords: patch

Created on 2021-06-07 22:38 by holmanb, last changed 2021-09-08 17:29 by lukasz.langa. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 26585 closed holmanb, 2021-06-07 22:47
PR 27231 merged corona10, 2021-07-18 16:11
PR 28229 merged corona10, 2021-09-08 05:14
Messages (9)
msg395293 - (view) Author: Brett Holman (holmanb) * Date: 2021-06-07 22:38
The existing --with-lto argument could be extended to pass through a value to select non-default lto compiler options:

CC=clang ./configure --with-lto=thin

This would allow default behavior to remain unchanged, while allowing those that want to use thin lto to opt in.

For what it's worth, the tests (make test) pass using clang 11.1.0 and thinlto.
msg397755 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-18 16:40
I am now building the experiment environment to compare between thin-lto and full-lto
msg397765 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-18 23:18
FYI, Thin LTO shows enhanced build time.


Full LTO (./configure --with-lto=full CC=clang)
real	2m33.740s
user	8m25.695s
sys	0m13.124s

Thin LTO (./configure --with-lto=thin CC=clang)
real	1m51.867s
user	12m53.694s
sys	0m12.786s
msg397766 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-18 23:47
The test is executed under the following environments.
There is no significant performance changed.

MS Azure: D8s v3
CentOS Linux release 8.2.2004 (Core)


[corona10@PythonLinux cpython]$ ./python -m pyperformance compare full.json thin.json
full.json
=========

Performance version: 1.0.2
Report on Linux-4.18.0-193.28.1.el8_2.x86_64-x86_64-with-glibc2.28
Number of logical CPUs: 8
Start date: 2021-07-18 23:18:04.644067
End date: 2021-07-18 23:44:20.951457

thin.json
=========

Performance version: 1.0.2
Report on Linux-4.18.0-193.28.1.el8_2.x86_64-x86_64-with-glibc2.28
Number of logical CPUs: 8
Start date: 2021-07-18 22:46:00.717563
End date: 2021-07-18 23:12:19.376766

### 2to3 ###
Mean +- std dev: 570 ms +- 17 ms -> 568 ms +- 20 ms: 1.00x faster
Not significant

### chameleon ###
Mean +- std dev: 16.9 ms +- 0.6 ms -> 16.9 ms +- 0.9 ms: 1.00x slower
Not significant

### chaos ###
Mean +- std dev: 182 ms +- 7 ms -> 179 ms +- 7 ms: 1.02x faster
Not significant

### crypto_pyaes ###
Mean +- std dev: 198 ms +- 6 ms -> 192 ms +- 6 ms: 1.03x faster
Significant (t=5.26)

### deltablue ###
Mean +- std dev: 13.4 ms +- 0.5 ms -> 13.5 ms +- 0.5 ms: 1.01x slower
Not significant

### django_template ###
Mean +- std dev: 94.0 ms +- 3.2 ms -> 91.8 ms +- 3.7 ms: 1.02x faster
Significant (t=3.53)

### dulwich_log ###
Mean +- std dev: 178 ms +- 6 ms -> 176 ms +- 8 ms: 1.02x faster
Not significant

### fannkuch ###
Mean +- std dev: 764 ms +- 17 ms -> 755 ms +- 15 ms: 1.01x faster
Not significant

### float ###
Mean +- std dev: 194 ms +- 8 ms -> 187 ms +- 6 ms: 1.03x faster
Significant (t=4.95)

### go ###
Mean +- std dev: 388 ms +- 14 ms -> 387 ms +- 14 ms: 1.00x faster
Not significant

### hexiom ###
Mean +- std dev: 17.0 ms +- 0.7 ms -> 17.5 ms +- 0.8 ms: 1.03x slower
Significant (t=-3.40)

### json_dumps ###
Mean +- std dev: 22.5 ms +- 0.9 ms -> 22.3 ms +- 0.7 ms: 1.01x faster
Not significant

### json_loads ###
Mean +- std dev: 45.8 us +- 2.3 us -> 46.5 us +- 1.8 us: 1.02x slower
Not significant

### logging_format ###
Mean +- std dev: 19.1 us +- 0.9 us -> 18.7 us +- 0.7 us: 1.02x faster
Not significant

### logging_silent ###
Mean +- std dev: 336 ns +- 17 ns -> 334 ns +- 18 ns: 1.00x faster
Not significant

### logging_simple ###
Mean +- std dev: 17.1 us +- 0.8 us -> 16.7 us +- 0.8 us: 1.03x faster
Significant (t=3.12)

### mako ###
Mean +- std dev: 27.6 ms +- 1.6 ms -> 26.6 ms +- 0.9 ms: 1.04x faster
Significant (t=4.11)

### meteor_contest ###
Mean +- std dev: 172 ms +- 5 ms -> 169 ms +- 5 ms: 1.01x faster
Not significant

### nbody ###
Mean +- std dev: 232 ms +- 8 ms -> 224 ms +- 8 ms: 1.04x faster
Significant (t=6.03)

### nqueens ###
Mean +- std dev: 167 ms +- 7 ms -> 166 ms +- 7 ms: 1.00x faster
Not significant

### pathlib ###
Mean +- std dev: 38.2 ms +- 1.7 ms -> 37.4 ms +- 1.9 ms: 1.02x faster
Significant (t=2.41)

### pickle ###
Mean +- std dev: 19.4 us +- 0.8 us -> 19.5 us +- 0.8 us: 1.00x slower
Not significant

### pickle_dict ###
Mean +- std dev: 43.5 us +- 1.9 us -> 43.0 us +- 1.9 us: 1.01x faster
Not significant

### pickle_list ###
Mean +- std dev: 6.81 us +- 0.26 us -> 6.81 us +- 0.27 us: 1.00x slower
Not significant

### pickle_pure_python ###
Mean +- std dev: 840 us +- 28 us -> 825 us +- 28 us: 1.02x faster
Not significant

### pidigits ###
Mean +- std dev: 294 ms +- 9 ms -> 294 ms +- 9 ms: 1.00x slower
Not significant

### pyflate ###
Mean +- std dev: 1.17 sec +- 0.02 sec -> 1.16 sec +- 0.03 sec: 1.01x faster
Not significant

### python_startup ###
Mean +- std dev: 15.3 ms +- 0.6 ms -> 15.3 ms +- 0.6 ms: 1.00x faster
Not significant

### python_startup_no_site ###
Mean +- std dev: 10.3 ms +- 0.3 ms -> 10.2 ms +- 0.4 ms: 1.01x faster
Not significant

### raytrace ###
Mean +- std dev: 911 ms +- 19 ms -> 911 ms +- 21 ms: 1.00x faster
Not significant

### regex_compile ###
Mean +- std dev: 314 ms +- 12 ms -> 310 ms +- 10 ms: 1.01x faster
Not significant

### regex_dna ###
Mean +- std dev: 317 ms +- 10 ms -> 299 ms +- 9 ms: 1.06x faster
Significant (t=9.99)

### regex_effbot ###
Mean +- std dev: 6.20 ms +- 0.27 ms -> 5.80 ms +- 0.25 ms: 1.07x faster
Significant (t=8.49)

### regex_v8 ###
Mean +- std dev: 43.0 ms +- 1.3 ms -> 39.9 ms +- 1.9 ms: 1.08x faster
Significant (t=10.22)

### richards ###
Mean +- std dev: 158 ms +- 8 ms -> 157 ms +- 8 ms: 1.01x faster
Not significant

### scimark_fft ###
Mean +- std dev: 727 ms +- 18 ms -> 716 ms +- 18 ms: 1.02x faster
Not significant

### scimark_lu ###
Mean +- std dev: 309 ms +- 11 ms -> 304 ms +- 10 ms: 1.01x faster
Not significant

### scimark_monte_carlo ###
Mean +- std dev: 180 ms +- 6 ms -> 181 ms +- 8 ms: 1.00x slower
Not significant

### scimark_sor ###
Mean +- std dev: 355 ms +- 9 ms -> 352 ms +- 11 ms: 1.01x faster
Not significant

### scimark_sparse_mat_mult ###
Mean +- std dev: 9.51 ms +- 0.32 ms -> 9.19 ms +- 0.34 ms: 1.03x faster
Significant (t=5.27)

### spectral_norm ###
Mean +- std dev: 277 ms +- 10 ms -> 272 ms +- 9 ms: 1.02x faster
Not significant

### sqlalchemy_declarative ###
Mean +- std dev: 273 ms +- 10 ms -> 273 ms +- 10 ms: 1.00x faster
Not significant

### sqlalchemy_imperative ###
Mean +- std dev: 46.2 ms +- 2.3 ms -> 45.6 ms +- 2.1 ms: 1.01x faster
Not significant

### sqlite_synth ###
Mean +- std dev: 4.39 us +- 0.29 us -> 4.37 us +- 0.23 us: 1.00x faster
Not significant

### sympy_expand ###
Mean +- std dev: 996 ms +- 22 ms -> 993 ms +- 31 ms: 1.00x faster
Not significant

### sympy_integrate ###
Mean +- std dev: 42.6 ms +- 1.9 ms -> 43.5 ms +- 2.0 ms: 1.02x slower
Significant (t=-2.47)

### sympy_str ###
Mean +- std dev: 607 ms +- 18 ms -> 599 ms +- 15 ms: 1.01x faster
Not significant

### sympy_sum ###
Mean +- std dev: 350 ms +- 12 ms -> 344 ms +- 11 ms: 1.02x faster
Not significant

### telco ###
Mean +- std dev: 11.3 ms +- 0.6 ms -> 11.2 ms +- 0.6 ms: 1.01x faster
Not significant

### tornado_http ###
Mean +- std dev: 294 ms +- 11 ms -> 296 ms +- 12 ms: 1.01x slower
Not significant

### unpack_sequence ###
Mean +- std dev: 78.4 ns +- 7.1 ns -> 75.7 ns +- 2.5 ns: 1.04x faster
Significant (t=2.86)

### unpickle ###
Mean +- std dev: 26.1 us +- 1.1 us -> 27.3 us +- 1.3 us: 1.05x slower
Significant (t=-5.57)

### unpickle_list ###
Mean +- std dev: 6.65 us +- 0.21 us -> 6.68 us +- 0.31 us: 1.00x slower
Not significant

### unpickle_pure_python ###
Mean +- std dev: 567 us +- 21 us -> 572 us +- 21 us: 1.01x slower
Not significant

### xml_etree_generate ###
Mean +- std dev: 165 ms +- 8 ms -> 166 ms +- 8 ms: 1.00x slower
Not significant

### xml_etree_iterparse ###
Mean +- std dev: 187 ms +- 6 ms -> 187 ms +- 8 ms: 1.00x faster
Not significant

### xml_etree_parse ###
Mean +- std dev: 274 ms +- 10 ms -> 274 ms +- 11 ms: 1.00x slower
Not significant

### xml_etree_process ###
Mean +- std dev: 142 ms +- 6 ms -> 139 ms +- 6 ms: 1.02x faster
Not significant
msg397767 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-18 23:53
clang version 11.0.0
msg397785 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-19 10:53
New changeset b2cf2513f9184c850a69fab718532b4f7c6a003d by Dong-hee Na in branch 'main':
bpo-44340: Add support for building with clang full/thin lto (GH-27231)
https://github.com/python/cpython/commit/b2cf2513f9184c850a69fab718532b4f7c6a003d
msg397786 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-19 10:55
Now CPython 3.11 supports the Thin LTO,
Thank you for the report and contribution, Brett!

And also thank you Pablo and Gregory for the reviews!
msg397855 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-07-20 06:51
@ned.deily

Can we use the thin-lto option for next macOS Python distribution?
In my local environment, it passes all tests :)

https://github.com/python/cpython/blob/366fcbac18e3adc41e3901580dbedb6a91e41a10/Mac/BuildScript/build-installer.py#L1199

FYI, Gentoo already recommends using the thin LTO instead of the full LTO.
https://wiki.gentoo.org/wiki/Clang#Link-time_optimizations_with_Clang
msg401415 - (view) Author: Ɓukasz Langa (lukasz.langa) * (Python committer) Date: 2021-09-08 17:29
New changeset 84ca5fcd31541929f0031e974a434b95d8e78aab by Dong-hee Na in branch 'main':
bpo-44340: Update whatsnews for ThinLTO (GH-28229)
https://github.com/python/cpython/commit/84ca5fcd31541929f0031e974a434b95d8e78aab
History
Date User Action Args
2021-09-08 17:29:40lukasz.langasetnosy: + lukasz.langa
messages: + msg401415
2021-09-08 05:14:08corona10setpull_requests: + pull_request26649
2021-07-20 06:51:58corona10setnosy: + ned.deily
messages: + msg397855
2021-07-19 10:55:14corona10setstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-07-19 10:55:07corona10setmessages: + msg397786
2021-07-19 10:53:01corona10setmessages: + msg397785
2021-07-18 23:53:02corona10setmessages: + msg397767
2021-07-18 23:47:12corona10setmessages: + msg397766
2021-07-18 23:18:53corona10setmessages: + msg397765
2021-07-18 17:10:50corona10setversions: + Python 3.11
2021-07-18 16:40:36corona10setmessages: + msg397755
2021-07-18 16:11:03corona10setpull_requests: + pull_request25779
2021-06-11 19:01:34FFY00setnosy: + FFY00
2021-06-10 00:00:14corona10setnosy: + corona10
2021-06-08 23:20:28ned.deilysetnosy: + gregory.p.smith
components: + Build, - Interpreter Core
2021-06-07 22:47:48holmanbsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25169
2021-06-07 22:38:11holmanbcreate