Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for building cpython with clang thin lto #88506

Closed
holmanb mannequin opened this issue Jun 7, 2021 · 9 comments
Closed

Add support for building cpython with clang thin lto #88506

holmanb mannequin opened this issue Jun 7, 2021 · 9 comments
Labels
3.11 only security fixes build The build process and cross-build type-feature A feature request or enhancement

Comments

@holmanb
Copy link
Mannequin

holmanb mannequin commented Jun 7, 2021

BPO 44340
Nosy @gpshead, @ned-deily, @ambv, @corona10, @FFY00, @holmanb
PRs
  • bpo-44340: Add support for building with clang thin lto via --with-lto=thin #26585
  • bpo-44340: Add support for building with clang full/thin lto #27231
  • bpo-44340: Update whatsnews for ThinLTO #28229
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-07-19.10:55:14.358>
    created_at = <Date 2021-06-07.22:38:11.007>
    labels = ['type-feature', 'build', '3.11']
    title = 'Add support for building cpython with clang thin lto'
    updated_at = <Date 2021-09-08.17:29:40.230>
    user = 'https://github.com/holmanb'

    bugs.python.org fields:

    activity = <Date 2021-09-08.17:29:40.230>
    actor = 'lukasz.langa'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-07-19.10:55:14.358>
    closer = 'corona10'
    components = ['Build']
    creation = <Date 2021-06-07.22:38:11.007>
    creator = 'holmanb'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 44340
    keywords = ['patch']
    message_count = 9.0
    messages = ['395293', '397755', '397765', '397766', '397767', '397785', '397786', '397855', '401415']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'ned.deily', 'lukasz.langa', 'corona10', 'FFY00', 'holmanb']
    pr_nums = ['26585', '27231', '28229']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue44340'
    versions = ['Python 3.11']

    @holmanb
    Copy link
    Mannequin Author

    holmanb mannequin commented Jun 7, 2021

    The existing --with-lto argument could be extended to pass through a value to select non-default lto compiler options:

    CC=clang ./configure --with-lto=thin

    This would allow default behavior to remain unchanged, while allowing those that want to use thin lto to opt in.

    For what it's worth, the tests (make test) pass using clang 11.1.0 and thinlto.

    @holmanb holmanb mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jun 7, 2021
    @ned-deily ned-deily added build The build process and cross-build and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Jun 8, 2021
    @corona10
    Copy link
    Member

    I am now building the experiment environment to compare between thin-lto and full-lto

    @corona10 corona10 added 3.11 only security fixes labels Jul 18, 2021
    @corona10
    Copy link
    Member

    FYI, Thin LTO shows enhanced build time.

    Full LTO (./configure --with-lto=full CC=clang)
    real 2m33.740s
    user 8m25.695s
    sys 0m13.124s

    Thin LTO (./configure --with-lto=thin CC=clang)
    real 1m51.867s
    user 12m53.694s
    sys 0m12.786s

    @corona10
    Copy link
    Member

    The test is executed under the following environments.
    There is no significant performance changed.

    MS Azure: D8s v3
    CentOS Linux release 8.2.2004 (Core)

    [corona10@PythonLinux cpython]$ ./python -m pyperformance compare full.json thin.json
    full.json
    =========

    Performance version: 1.0.2
    Report on Linux-4.18.0-193.28.1.el8_2.x86_64-x86_64-with-glibc2.28
    Number of logical CPUs: 8
    Start date: 2021-07-18 23:18:04.644067
    End date: 2021-07-18 23:44:20.951457

    thin.json
    =========

    Performance version: 1.0.2
    Report on Linux-4.18.0-193.28.1.el8_2.x86_64-x86_64-with-glibc2.28
    Number of logical CPUs: 8
    Start date: 2021-07-18 22:46:00.717563
    End date: 2021-07-18 23:12:19.376766

    ### 2to3 ###
    Mean +- std dev: 570 ms +- 17 ms -> 568 ms +- 20 ms: 1.00x faster
    Not significant

    ### chameleon ###
    Mean +- std dev: 16.9 ms +- 0.6 ms -> 16.9 ms +- 0.9 ms: 1.00x slower
    Not significant

    ### chaos ###
    Mean +- std dev: 182 ms +- 7 ms -> 179 ms +- 7 ms: 1.02x faster
    Not significant

    ### crypto_pyaes ###
    Mean +- std dev: 198 ms +- 6 ms -> 192 ms +- 6 ms: 1.03x faster
    Significant (t=5.26)

    ### deltablue ###
    Mean +- std dev: 13.4 ms +- 0.5 ms -> 13.5 ms +- 0.5 ms: 1.01x slower
    Not significant

    ### django_template ###
    Mean +- std dev: 94.0 ms +- 3.2 ms -> 91.8 ms +- 3.7 ms: 1.02x faster
    Significant (t=3.53)

    ### dulwich_log ###
    Mean +- std dev: 178 ms +- 6 ms -> 176 ms +- 8 ms: 1.02x faster
    Not significant

    ### fannkuch ###
    Mean +- std dev: 764 ms +- 17 ms -> 755 ms +- 15 ms: 1.01x faster
    Not significant

    ### float ###
    Mean +- std dev: 194 ms +- 8 ms -> 187 ms +- 6 ms: 1.03x faster
    Significant (t=4.95)

    ### go ###
    Mean +- std dev: 388 ms +- 14 ms -> 387 ms +- 14 ms: 1.00x faster
    Not significant

    ### hexiom ###
    Mean +- std dev: 17.0 ms +- 0.7 ms -> 17.5 ms +- 0.8 ms: 1.03x slower
    Significant (t=-3.40)

    ### json_dumps ###
    Mean +- std dev: 22.5 ms +- 0.9 ms -> 22.3 ms +- 0.7 ms: 1.01x faster
    Not significant

    ### json_loads ###
    Mean +- std dev: 45.8 us +- 2.3 us -> 46.5 us +- 1.8 us: 1.02x slower
    Not significant

    ### logging_format ###
    Mean +- std dev: 19.1 us +- 0.9 us -> 18.7 us +- 0.7 us: 1.02x faster
    Not significant

    ### logging_silent ###
    Mean +- std dev: 336 ns +- 17 ns -> 334 ns +- 18 ns: 1.00x faster
    Not significant

    ### logging_simple ###
    Mean +- std dev: 17.1 us +- 0.8 us -> 16.7 us +- 0.8 us: 1.03x faster
    Significant (t=3.12)

    ### mako ###
    Mean +- std dev: 27.6 ms +- 1.6 ms -> 26.6 ms +- 0.9 ms: 1.04x faster
    Significant (t=4.11)

    ### meteor_contest ###
    Mean +- std dev: 172 ms +- 5 ms -> 169 ms +- 5 ms: 1.01x faster
    Not significant

    ### nbody ###
    Mean +- std dev: 232 ms +- 8 ms -> 224 ms +- 8 ms: 1.04x faster
    Significant (t=6.03)

    ### nqueens ###
    Mean +- std dev: 167 ms +- 7 ms -> 166 ms +- 7 ms: 1.00x faster
    Not significant

    ### pathlib ###
    Mean +- std dev: 38.2 ms +- 1.7 ms -> 37.4 ms +- 1.9 ms: 1.02x faster
    Significant (t=2.41)

    ### pickle ###
    Mean +- std dev: 19.4 us +- 0.8 us -> 19.5 us +- 0.8 us: 1.00x slower
    Not significant

    ### pickle_dict ###
    Mean +- std dev: 43.5 us +- 1.9 us -> 43.0 us +- 1.9 us: 1.01x faster
    Not significant

    ### pickle_list ###
    Mean +- std dev: 6.81 us +- 0.26 us -> 6.81 us +- 0.27 us: 1.00x slower
    Not significant

    ### pickle_pure_python ###
    Mean +- std dev: 840 us +- 28 us -> 825 us +- 28 us: 1.02x faster
    Not significant

    ### pidigits ###
    Mean +- std dev: 294 ms +- 9 ms -> 294 ms +- 9 ms: 1.00x slower
    Not significant

    ### pyflate ###
    Mean +- std dev: 1.17 sec +- 0.02 sec -> 1.16 sec +- 0.03 sec: 1.01x faster
    Not significant

    ### python_startup ###
    Mean +- std dev: 15.3 ms +- 0.6 ms -> 15.3 ms +- 0.6 ms: 1.00x faster
    Not significant

    ### python_startup_no_site ###
    Mean +- std dev: 10.3 ms +- 0.3 ms -> 10.2 ms +- 0.4 ms: 1.01x faster
    Not significant

    ### raytrace ###
    Mean +- std dev: 911 ms +- 19 ms -> 911 ms +- 21 ms: 1.00x faster
    Not significant

    ### regex_compile ###
    Mean +- std dev: 314 ms +- 12 ms -> 310 ms +- 10 ms: 1.01x faster
    Not significant

    ### regex_dna ###
    Mean +- std dev: 317 ms +- 10 ms -> 299 ms +- 9 ms: 1.06x faster
    Significant (t=9.99)

    ### regex_effbot ###
    Mean +- std dev: 6.20 ms +- 0.27 ms -> 5.80 ms +- 0.25 ms: 1.07x faster
    Significant (t=8.49)

    ### regex_v8 ###
    Mean +- std dev: 43.0 ms +- 1.3 ms -> 39.9 ms +- 1.9 ms: 1.08x faster
    Significant (t=10.22)

    ### richards ###
    Mean +- std dev: 158 ms +- 8 ms -> 157 ms +- 8 ms: 1.01x faster
    Not significant

    ### scimark_fft ###
    Mean +- std dev: 727 ms +- 18 ms -> 716 ms +- 18 ms: 1.02x faster
    Not significant

    ### scimark_lu ###
    Mean +- std dev: 309 ms +- 11 ms -> 304 ms +- 10 ms: 1.01x faster
    Not significant

    ### scimark_monte_carlo ###
    Mean +- std dev: 180 ms +- 6 ms -> 181 ms +- 8 ms: 1.00x slower
    Not significant

    ### scimark_sor ###
    Mean +- std dev: 355 ms +- 9 ms -> 352 ms +- 11 ms: 1.01x faster
    Not significant

    ### scimark_sparse_mat_mult ###
    Mean +- std dev: 9.51 ms +- 0.32 ms -> 9.19 ms +- 0.34 ms: 1.03x faster
    Significant (t=5.27)

    ### spectral_norm ###
    Mean +- std dev: 277 ms +- 10 ms -> 272 ms +- 9 ms: 1.02x faster
    Not significant

    ### sqlalchemy_declarative ###
    Mean +- std dev: 273 ms +- 10 ms -> 273 ms +- 10 ms: 1.00x faster
    Not significant

    ### sqlalchemy_imperative ###
    Mean +- std dev: 46.2 ms +- 2.3 ms -> 45.6 ms +- 2.1 ms: 1.01x faster
    Not significant

    ### sqlite_synth ###
    Mean +- std dev: 4.39 us +- 0.29 us -> 4.37 us +- 0.23 us: 1.00x faster
    Not significant

    ### sympy_expand ###
    Mean +- std dev: 996 ms +- 22 ms -> 993 ms +- 31 ms: 1.00x faster
    Not significant

    ### sympy_integrate ###
    Mean +- std dev: 42.6 ms +- 1.9 ms -> 43.5 ms +- 2.0 ms: 1.02x slower
    Significant (t=-2.47)

    ### sympy_str ###
    Mean +- std dev: 607 ms +- 18 ms -> 599 ms +- 15 ms: 1.01x faster
    Not significant

    ### sympy_sum ###
    Mean +- std dev: 350 ms +- 12 ms -> 344 ms +- 11 ms: 1.02x faster
    Not significant

    ### telco ###
    Mean +- std dev: 11.3 ms +- 0.6 ms -> 11.2 ms +- 0.6 ms: 1.01x faster
    Not significant

    ### tornado_http ###
    Mean +- std dev: 294 ms +- 11 ms -> 296 ms +- 12 ms: 1.01x slower
    Not significant

    ### unpack_sequence ###
    Mean +- std dev: 78.4 ns +- 7.1 ns -> 75.7 ns +- 2.5 ns: 1.04x faster
    Significant (t=2.86)

    ### unpickle ###
    Mean +- std dev: 26.1 us +- 1.1 us -> 27.3 us +- 1.3 us: 1.05x slower
    Significant (t=-5.57)

    ### unpickle_list ###
    Mean +- std dev: 6.65 us +- 0.21 us -> 6.68 us +- 0.31 us: 1.00x slower
    Not significant

    ### unpickle_pure_python ###
    Mean +- std dev: 567 us +- 21 us -> 572 us +- 21 us: 1.01x slower
    Not significant

    ### xml_etree_generate ###
    Mean +- std dev: 165 ms +- 8 ms -> 166 ms +- 8 ms: 1.00x slower
    Not significant

    ### xml_etree_iterparse ###
    Mean +- std dev: 187 ms +- 6 ms -> 187 ms +- 8 ms: 1.00x faster
    Not significant

    ### xml_etree_parse ###
    Mean +- std dev: 274 ms +- 10 ms -> 274 ms +- 11 ms: 1.00x slower
    Not significant

    ### xml_etree_process ###
    Mean +- std dev: 142 ms +- 6 ms -> 139 ms +- 6 ms: 1.02x faster
    Not significant

    @corona10
    Copy link
    Member

    clang version 11.0.0

    @corona10
    Copy link
    Member

    New changeset b2cf251 by Dong-hee Na in branch 'main':
    bpo-44340: Add support for building with clang full/thin lto (GH-27231)
    b2cf251

    @corona10
    Copy link
    Member

    Now CPython 3.11 supports the Thin LTO,
    Thank you for the report and contribution, Brett!

    And also thank you Pablo and Gregory for the reviews!

    @corona10
    Copy link
    Member

    @ned.deily

    Can we use the thin-lto option for next macOS Python distribution?
    In my local environment, it passes all tests :)

    (' ', "--enable-optimizations --with-lto")[compilerCanOptimize()],

    FYI, Gentoo already recommends using the thin LTO instead of the full LTO.
    https://wiki.gentoo.org/wiki/Clang#Link-time_optimizations_with_Clang

    @ambv
    Copy link
    Contributor

    ambv commented Sep 8, 2021

    New changeset 84ca5fc by Dong-hee Na in branch 'main':
    bpo-44340: Update whatsnews for ThinLTO (GH-28229)
    84ca5fc

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes build The build process and cross-build type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants