Issue 25702: Link Time Optimizations support for GCC and CLANG

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69888

classification

Title:	Link Time Optimizations support for GCC and CLANG
Type:	performance	Stage:	resolved
Components:	Build	Versions:	Python 3.6, Python 3.5, Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:	26787 26788	Superseder:
Assigned To:		Nosy List:	David Filiatrault, alecsandru.patrascu, gregory.p.smith, lemburg, methane, pitrou, python-dev, r.david.murray, scoder, skrah, steve.dower, vstinner, zach.ware
Priority:	normal	Keywords:	patch

Created on 2015-11-23 08:59 by alecsandru.patrascu, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
lto-cpython2-v01.patch	alecsandru.patrascu, 2015-11-23 09:00		review
lto-cpython3-v01.patch	alecsandru.patrascu, 2015-11-23 09:00		review
lto-cpython2-v02.patch	alecsandru.patrascu, 2015-11-23 14:44		review
lto-cpython3-v02.patch	alecsandru.patrascu, 2015-11-23 14:44		review
lto-cpython2-v03.patch	alecsandru.patrascu, 2016-01-04 15:19		review
lto-cpython3-v03.patch	alecsandru.patrascu, 2016-01-04 15:19		review
lto-cpython2-v04.patch	alecsandru.patrascu, 2016-01-20 09:27		review
lto-cpython3-v04.patch	alecsandru.patrascu, 2016-01-20 09:28		review

Messages (45)
msg255140 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2015-11-23 08:59
Title: Link Time Optimizations support for GCC and CLANG Hi All, This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a patch that adds support for Link Time Optimization (LTO) when using GCC and CLANG to compile CPython2 and CPython3. LTO is a compiler assisted optimization technique that is performed by the compiler at link time. Combined with Profile Guided Optimization (PGO), enabled when running "make profile-opt", and running the Grand Unified Python Benchmark (GUPB), a speedup up to 11%, with a few regressions, was observed comparing with PGO only. Compared with a default build, a performance gain as high as 26% was observed from PGO+LTO. In addition, we are also seeing 2% boost in throughput rate from our OpenStack Swift setup comparing with PGO only. Our GUPB performance evaluation was conducted on Intel SkyLake/Broadwell systems running CentOS/Ubuntu, with CLANG/LLVM and GCC 4./5.. Our OpenStack Swift performance was done on various systems consisting of XEON and Avoton processors. Steps: ====== 1. Get the CPython source codes hg clone https://hg.python.org/cpython cpython cd cpython hg update 2.7 (for CPython2) 2. Build the binary a) Default: ./configure make b) PGO: ./configure make profile-opt c) PGO+LTO: Copy the attached patch files hg import --no-commit lto-cpython3-v01.patch (for CPython3) hg import --no-commit lto-cpython2-v01.patch (for CPython2) ./configure make profile-opt Hardware and OS Configuration ============================= Hardware: Intel XEON (Broadwell-DE) 8 Cores BIOS settings: Intel Turbo Boost Technology: false Hyper-Threading: false OS: Ubuntu 14.04.3 LTS Server OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run to run variation by echo 0 > /proc/sys/kernel/randomize_va_space CPU frequency set fixed at 2.6GHz GCC version: GCC version 4.9.2 Benchmark: Grand Unified Python Benchmark from https://hg.python.org/benchmarks/ Measurements and Results ======================== A. Repository: GUPB Benchmark: hg id : 2979f5ce6a0c tip hg --debug id -i : 2979f5ce6a0cee994d5485401945d8457bb0afac CPython3: hg id : 21a28f6de358 hg id -r 'ancestors(.) and tag()': 374f501f4567 (3.5) v3.5.0 hg --debug id -i : 21a28f6de3582833652c958b8fd6ae8448b61c7c CPython2: hg id : a37ea1d56e98 (2.7) hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10 hg --debug id -i : a37ea1d56e98eb158750d3e495a5cf524e8c3980 B. Results: CPython2 and CPython3 sample results, measured on a Broadwell platform, can be viewed in Table 1 and 2. On the first column (Benchmark) you can see the benchmark name, on the second (%D) the speedup compared with the default version and on the third column (%PGO) the speedup compared with just PGO; a higher value is better. Table 1. CPython2 results: Benchmark %D %PGO -------------------------------- raytrace 18 3 chaos 16 5 django_v2 16 6 mako 16 6 pathlib 15 3 simple_logging 15 1 slowpickle 15 5 django 14 4 go 14 4 richards 13 -1 float 12 4 slowunpickle 12 4 etree_process 11 3 fastunpickle 11 6 formatted_logging 11 3 nqueens 11 1 regex_compile 11 3 etree_iterparse 10 4 mako_v2 10 3 telco 10 5 pybench 9 1 hexiom2 9 1 html5lib_warmup 9 3 meteor_contest 9 4 pickle_list 9 5 2to3 8 2 bzr_startup 8 2 chameleon 8 0 etree_generate 8 2 regex_v8 8 3 silent_logging 8 1 fannkuch 7 1 html5lib 7 3 json_load 7 -5 tornado_http 7 3 call_method_slots 6 3 json_dump_v2 6 -4 spambayes 6 2 unpickle_list 6 0 etree_parse 5 3 fastpickle 5 4 rietveld 5 1 call_method 4 -1 normal_startup 4 2 startup_nosite 4 2 slowspitfire 3 0 ssbench 4 2 call_method_unknown 1 -6 json_dump 1 -4 nbody 1 1 pidigits 1 -10 pickle_dict 0 -1 regex_effbot 0 -2 spectral_norm 0 -3 call_simple -3 -3 unpack_sequence -6 -2 Table 2. CPython3 results: Benchmark %D %PGO -------------------------------- formatted_logging 26 11 raytrace 24 8 simple_logging 24 6 richards 22 3 chaos 21 7 go 21 11 hexiom2 21 8 nbody 21 9 etree_generate 19 5 etree_process 19 5 call_method_slots 18 3 fastunpickle 18 0 pathlib 18 5 regex_compile 18 8 float 17 8 nqueens 17 7 call_method 16 3 etree_iterparse 16 9 json_dump 16 -4 json_load 16 5 silent_logging 15 8 2to3 14 5 fannkuch 14 8 call_simple 12 0 meteor_contest 12 7 call_method_unknown 11 -1 spectral_norm 11 4 json_dump_v2 10 3 telco 10 5 fastpickle 9 -4 etree_parse 8 1 normal_startup 8 3 startup_nosite 7 3 unpack_sequence 7 3 regex_v8 6 4 unpickle_list 5 3 pickle_list 1 -10 pidigits 1 -11 regex_effbot -2 2 pickle_dict -3 -10 Thank you, Alecsandru
msg255148 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-11-23 11:36
LTO only exists on recent versions of gcc, so the configure script should probably do some version checking. Also we can't enable it by default as 1) it makes compile times much longer 2) there are some bugs in some gcc versions (see e.g. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=753134).
msg255150 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2015-11-23 12:18
LTO exists in GCC since version 4.5, but it is true that only recent versions (>=4.8) perform it in good conditions. It is not enabled by default in this patch, it is only available when building with PGO support. Running just "make" will not activate the LTO flags. Do you see it as an configure option (using, for example, an explicit --with-lto flag) rather than using it automatically?
msg255165 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2015-11-23 14:44
Meanwhile I've added the patches (v02) for LTO enabled only if the "./configure --with-lto" command is issued.
msg255167 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-11-23 14:46
Le 23/11/2015 13:18, Alecsandru Patrascu a écrit : > > Do you see it as an configure option (using, for example, an > explicit --with-lto flag) rather than using it automatically? That would be nice. This way people can easily test different combinations of flags.
msg255168 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2015-11-23 14:47
Le 23/11/2015 15:44, Alecsandru Patrascu a écrit : > > Meanwhile I've added the patches (v02) for LTO enabled only if the "./configure --with-lto" command is issued. Cool, thanks!
msg257042 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2015-12-26 17:46
I'm adding Brett, Gregory, Stefan and Victor as nosy because this issue might be interesting for them also.
msg257463 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-01-04 15:19
Hello, I've added an updated set of patches (v03) for the current CPython2 and CPython3 codebase. Also made some small changes to reduce the number of places where the flags are set.
msg258627 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2016-01-19 22:22
I'm a bit concerned that the flags are being added unconditionally to CFLAGS and LDFLAGS (when configured --with-lto), which means extensions are forced into it as well. I think it would be better to use CFLAGS_NODIST and to add LDFLAGS_NODIST. Unfortunately, 2.7 doesn't have even CFLAGS_NODIST; I suspect it may be time to backport that.
msg258632 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-01-19 23:03
> Unfortunately, 2.7 doesn't have even CFLAGS_NODIST; I suspect it may be time to backport that. I don't think now is a good time to introduce instability in the 2.7 branch.
msg258642 - (view)	Author: Zachary Ware (zach.ware) *	Date: 2016-01-20 05:28
Unless I'm just missing something, I don't see how introducing CFLAGS_NODIST and LDFLAGS_NODIST to 2.7 would introduce instability. It should be a fairly non-invasive change, restricted to configure and the Makefile; both vars should usually be empty and thus builds should be entirely unaffected unless options like --with-lto are chosen. On a separate note about the patch: as mentioned in msg251305, it's probably better to restrict adding the LTO flags to just the profile-opt targets, even with the --with-lto check.
msg258651 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-01-20 08:37
Any non-bugfix introduction can introduce instability. That has always been our position with respect to adding features to bugfix branches. I don't see how adding a LTO option this late in the 2.7 release cycle can be considered important enough to break that rule. Let me add that downstream distributors already customize compilation options (Ubuntu's Python is compiled with both PGO and LTO enabled, AFAIR), so this change may only really affect the tiny subset of non-Windows users that compile Python themselves. But well, perhaps Python development has become boring to the point of deliberately introducing uncertainty and risk to make things a bit more fun? ;-)
msg258653 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-01-20 08:41
I suggest to only modify the default branch and work with downstream (like Linux vendors) to compile Python with best compiler options. I'm talking about the default compilation mode. Maybe we can add a configure option to 2.7 and 3.5, disabled by default, to use best options. Sorry I didn't read the whole discussion.
msg258655 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-01-20 09:27
Thank you for your feedback, I've updated the patches and now LTO flags are used only when building with PGO (v04). CFLAGS/LDFLAGS remain untouched, as Antoine and Victor suggested is better.
msg258658 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2016-01-20 10:24
On 20.01.2016 09:37, Antoine Pitrou wrote: > Let me add that downstream distributors already customize compilation options (Ubuntu's Python is compiled with both PGO and LTO enabled, AFAIR), so this change may only really affect the tiny subset of non-Windows users that compile Python themselves. Are the Windows installers on python.org compiled with PGO and LTO enabled ? If not, then the patch would also effect the not-so-tiny fraction of Python users on Windows ;-) BTW: It may make sense to start collecting the various performance related optional patches to Python 2.7 on a wiki page for interested parties to use.
msg258660 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-01-20 10:38
> If not, then the patch would also effect the not-so-tiny fraction of Python users on Windows ;-) I don't see how enabling LTO for gcc and clang could ever affect our Windows users ;-)
msg258663 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2016-01-20 11:03
On 20.01.2016 11:38, Antoine Pitrou wrote: >> If not, then the patch would also effect the not-so-tiny fraction >> of Python users on Windows ;-) > > I don't see how enabling LTO for gcc and clang could ever affect our Windows users ;-) You have a point there, but perhaps we could start offering an ICC compiled version for Windows ;-)
msg258682 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2016-01-20 13:57
My understanding is that we (starting with Guido) have made a blanket exception for 2.7 for useful performance and build-system-only related patches. That doesn't mean anything can go in (the usual rules about "is this worth it/backward compatible/won't break things" still apply) but it is a lower bar than is true for other maintenance only releases. Perhaps my understanding is in error, though. I believe Intel is committed to supporting this, so if there do turn out to be any maintenance issues they can handle them. (Which IIUC is Nick's argument: if someone wants to support 2.7 with stuff we are willing to let in, we should let them as long as they credibly commit to supporting it.) I'm currently part of that Intel support, though, so someone else should rule on this.
msg258697 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2016-01-20 17:38
To help answer MAL's question about Windows: I know the python.org installers are not built with PGO, but I don't know about LTO.
msg258703 - (view)	Author: Steve Dower (steve.dower) *	Date: 2016-01-20 18:02
MSVC has had Link-Time Code Generation for many releases, and it should have been used for all 2.7 releases (definitely used in 3.5+) to optimize references between object files. I assume this is equivalent to LTO. We currently don't use PGO in the official Windows builds, but it is a supported build configuration.
msg259019 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-01-27 12:58
As Steve mentioned, the Microsoft compiler uses LTO (they call it Link-Time Code Generation) and the flags are used when compiling CPython on Windows systems. Thus our proposal to enable it on GCC and CLANG also.
msg261150 - (view)	Author: Inada Naoki (methane) *	Date: 2016-03-03 08:46
Can we use LTO without PGO? PGO increases build time few times.
msg261154 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-03-03 09:48
Yes, you can use LTO without PGO, but the proposed ways it's more efficient and makes more sense for CPython builds.
msg261155 - (view)	Author: Inada Naoki (methane) *	Date: 2016-03-03 10:15
Sorry my poor English. I meant that "Does `./configure --with-lto && make` use LTO?".
msg261158 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-03-03 11:09
I understand now your question. LTO is not enabled when running just `make`, only in `make profile-opt`
msg261181 - (view)	Author: Inada Naoki (methane) *	Date: 2016-03-04 01:32
I've tried LTO without PGO in Debian Jessie. $ LTOFLAGS='-flto -fuse-linker-plugin -ffat-lto-objects -flto-partition=none' $ CFLAGS=$LTOFLAGS LDFLAGS=$LTOFLAGS ./configure --prefix=... $ make -j32 results is here (compared with neither LTO and PGO): Test minimum run-time average run-time this other diff this other diff ------------------------------------------------------------------------------- BuiltinFunctionCalls: 47ms 50ms -6.6% 48ms 51ms -6.0% BuiltinMethodLookup: 29ms 29ms -1.3% 29ms 29ms -0.1% CompareFloats: 32ms 33ms -2.8% 34ms 34ms -0.5% CompareFloatsIntegers: 67ms 70ms -3.9% 69ms 71ms -3.1% CompareIntegers: 48ms 46ms +5.1% 49ms 47ms +5.8% CompareInternedStrings: 30ms 31ms -1.9% 31ms 31ms -1.6% CompareLongs: 28ms 26ms +8.0% 29ms 27ms +8.5% CompareStrings: 26ms 26ms -0.9% 27ms 26ms +1.5% ComplexPythonFunctionCalls: 47ms 51ms -8.9% 48ms 52ms -7.8% ConcatStrings: 32ms 33ms -3.2% 33ms 34ms -2.2% CreateInstances: 51ms 52ms -2.5% 52ms 53ms -3.5% CreateNewInstances: 38ms 40ms -4.5% 39ms 41ms -4.4% CreateStringsWithConcat: 68ms 69ms -1.4% 70ms 71ms -0.4% DictCreation: 53ms 51ms +5.2% 55ms 52ms +6.7% DictWithFloatKeys: 41ms 42ms -2.2% 43ms 43ms -0.0% DictWithIntegerKeys: 34ms 34ms +0.1% 35ms 35ms +0.5% DictWithStringKeys: 31ms 32ms -1.3% 32ms 32ms -1.6% ForLoops: 26ms 30ms -12.1% 28ms 30ms -8.7% IfThenElse: 42ms 41ms +2.6% 43ms 41ms +5.0% ListSlicing: 40ms 40ms -0.8% 41ms 41ms -0.4% NestedForLoops: 42ms 42ms -0.3% 43ms 43ms +0.6% NestedListComprehensions: 42ms 47ms -11.9% 45ms 50ms -10.5% NormalClassAttribute: 89ms 96ms -7.9% 92ms 98ms -5.9% NormalInstanceAttribute: 47ms 45ms +4.8% 48ms 45ms +4.9% PythonFunctionCalls: 41ms 44ms -7.5% 41ms 45ms -7.4% PythonMethodCalls: 53ms 59ms -9.4% 55ms 60ms -8.5% Recursion: 69ms 73ms -5.1% 71ms 74ms -4.2% SecondImport: 36ms 41ms -12.0% 38ms 42ms -9.9% SecondPackageImport: 45ms 42ms +6.5% 46ms 43ms +7.0% SecondSubmoduleImport: 115ms 107ms +7.9% 117ms 108ms +7.9% SimpleComplexArithmetic: 27ms 29ms -6.5% 28ms 30ms -4.5% SimpleDictManipulation: 60ms 65ms -7.8% 61ms 66ms -7.0% SimpleFloatArithmetic: 33ms 30ms +7.4% 34ms 31ms +8.3% SimpleIntFloatArithmetic: 36ms 38ms -3.3% 37ms 38ms -4.0% SimpleIntegerArithmetic: 36ms 38ms -5.2% 37ms 38ms -4.1% SimpleListComprehensions: 36ms 37ms -3.2% 38ms 41ms -7.5% SimpleListManipulation: 34ms 34ms -1.3% 35ms 38ms -6.8% SimpleLongArithmetic: 26ms 26ms +0.3% 27ms 30ms -7.5% SmallLists: 45ms 47ms -4.1% 46ms 56ms -17.2% SmallTuples: 51ms 54ms -6.3% 53ms 62ms -14.8% SpecialClassAttribute: 92ms 97ms -5.0% 95ms 99ms -4.8% SpecialInstanceAttribute: 46ms 45ms +2.5% 48ms 46ms +3.9% StringMappings: 71ms 100ms -29.0% 73ms 101ms -27.8% StringPredicates: 49ms 59ms -17.8% 50ms 60ms -16.5% StringSlicing: 48ms 47ms +3.3% 79ms 47ms +66.2% TryExcept: 24ms 29ms -16.9% 25ms 30ms -15.8% TryFinally: 35ms 37ms -6.0% 36ms 38ms -4.6% TryRaiseExcept: 12ms 13ms -7.5% 13ms 14ms -7.2% TupleSlicing: 48ms 50ms -2.9% 49ms 51ms -2.7% WithFinally: 52ms 57ms -8.4% 53ms 58ms -8.2% WithRaiseExcept: 42ms 46ms -8.8% 43ms 47ms -9.1% ------------------------------------------------------------------------------- Totals: 2291ms 2398ms -4.5% 2390ms 2470ms -3.2% (this=lto.pybench, other=default.pybench)
msg261183 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-03-04 07:18
From our experience, pybench only is not a representative benchmark. Instead, if you like to measure performance close to real workloads, you can run the Grand Unified Python Benchmark suite, that is more complete. Also, you need to take into consideration the hardware and software environment. For this, you can read the initial comment at this issue, section "Hardware and OS Configuration", to see the approach we have here at Intel.
msg261189 - (view)	Author: Inada Naoki (methane) *	Date: 2016-03-04 15:34
The machine is Google Compute Engine n1-highcpu-32 (Intel Ivy Bridge) Linux bench 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 GNU/Linux cpuinfo: processor : 31 vendor_id : GenuineIntel cpu family : 6 model : 62 model name : Intel(R) Xeon(R) CPU @ 2.50GHz stepping : 4 microcode : 0x1 cpu MHz : 2500.000 cache size : 30720 KB command: $ python perf.py -r -b default ../Python-3.5.1/python-default ../Python-3.5.1/python-lto output: Report on Linux bench 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u3 (2016-01-17) x86_64 Total CPU cores: 32 ### 2to3 ### Min: 8.692000 -> 8.160000: 1.07x faster Avg: 8.816800 -> 8.253600: 1.07x faster Significant (t=8.07) Stddev: 0.12726 -> 0.09027: 1.4098x smaller ### chameleon_v2 ### Min: 6.756928 -> 6.414046: 1.05x faster Avg: 6.849192 -> 6.666536: 1.03x faster Significant (t=20.88) Stddev: 0.04413 -> 0.07555: 1.7120x larger ### fastpickle ### Min: 0.540906 -> 0.564253: 1.04x slower Avg: 0.549624 -> 0.579263: 1.05x slower Significant (t=-34.29) Stddev: 0.00427 -> 0.00752: 1.7622x larger ### nbody ### Min: 0.260169 -> 0.273837: 1.05x slower Avg: 0.267334 -> 0.280441: 1.05x slower Significant (t=-34.05) Stddev: 0.00257 -> 0.00286: 1.1125x larger ### regex_v8 ### Min: 0.047335 -> 0.044750: 1.06x faster Avg: 0.049424 -> 0.046788: 1.06x faster Significant (t=10.46) Stddev: 0.00174 -> 0.00182: 1.0469x larger The following not significant results are hidden, use -v to show them: django_v3, fastunpickle, json_dump_v2, json_load, tornado_http.
msg261190 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-03-04 16:02
You are doing measurements on a virtual machine... For sure you are not the only user that has active workloads on the physical machine while you do benchmarks :) On the other hand, the path you are going with just LTO is nice for experiments, but for real-world usages is not feasible. Using it in conjunction with PGO is the way to have the best Python interpreter, and I strongly recommend for you to use the v04 versions of the patches.
msg261205 - (view)	Author: Inada Naoki (methane) *	Date: 2016-03-05 00:22
> For sure you are not the only user that has active workloads on the physical machine while you do benchmarks :) I think largest machine type I chosen (32core) can avoid sharing physical machine with other users. > On the other hand, the path you are going with just LTO is nice for experiments, but for real-world usages is not feasible. Using it in conjunction with PGO is the way to have the best Python interpreter, and I strongly recommend for you to use the v04 versions of the patches. I agree PGO+LTE is the best. But I want "only LTO" because: 1) It is a pitfall that `./configure --with-lto && make` doesn't use LTO. 2) PGO makes build too slow. For casual usecase, I can wait LTO but not PGO.
msg261208 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-03-05 00:44
Piping up from the peanut gallery here: If your use case is not doing release builds for production use, i.e. "casual use", don't bother with either PGO or LTO. It won't matter. Your final build that you Q&A ship should absolutely use those. (nobody's going to disagree with that :) While I would not reject changes that allow --with-lto to work in the absence of PGO, but I don't think it should be anyone's priority.
msg263354 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-04-13 19:24
+ --with-lto Enable Link Time Optimization in PGO builds. + Disabled by default. I don't understand why it's disabled by default. IMHO we must enable all the best optimizers options by default. But I expect all optimizations to be disabled by --with-debug.
msg263355 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-13 19:50
LTO is not stable on all platforms (according to doko), and people don't want to wait for PGO to build when they just run ./configure && make. --with-pgo and --with-lto is fine.
msg263356 - (view)	Author: STINNER Victor (vstinner) *	Date: 2016-04-13 19:53
> LTO is not stable on all platforms (according to doko), and people don't want to wait for PGO to build when they just run ./configure && make. Can we have a whitelist of arch known to support PGO and/or LTO? Or maybe a blacklist? Ubuntu already has this knownledge in their package, no?
msg263357 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2016-04-13 19:55
On 13.04.2016 21:50, Stefan Krah wrote: > > LTO is not stable on all platforms (according to doko), and people don't > want to wait for PGO to build when they just run ./configure && make. > > --with-pgo and --with-lto is fine. Agreed. Let's not make compilation take longer than necessary. When doing production builds, people can still enable these optimizations as necessary.
msg263358 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2016-04-13 19:57
Le 13/04/2016 21:55, Marc-Andre Lemburg a écrit : >> >> LTO is not stable on all platforms (according to doko), and people don't >> want to wait for PGO to build when they just run ./configure && make. >> >> --with-pgo and --with-lto is fine. > > Agreed. Let's not make compilation take longer than necessary. > > When doing production builds, people can still enable these > optimizations as necessary. Agreed as well. It's enough to make these options sufficiently accessible.
msg263383 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-04-14 08:39
@Stefan and @Marc, you say that people don't want to wait for PGO to build when running ./configure && make, but why? Even though many developers use it, this mode is not intended for development, it is production level and should be run once (or at leas a limited number or times), when the developers are sure that everything is fine in the debug mode. As Victor previously said, we should have all the good stuff (PGO, LTO, etc) enabled by default, regardless the time needed to do it. @Victor, indeed, LTO is not yet good enough to use it stand-alone in CPython. That is the reason why it is enabled only with PGO, because applied over it, we obtain further speedups than PGO alone. Also Ubuntu uses PGO and LTO in their releases. But in the end maybe `./configure --with-lto && make profile-opt` will have to do for everybody.
msg263385 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-14 08:55
On Thu, Apr 14, 2016 at 08:39:20AM +0000, Alecsandru Patrascu wrote: > @Stefan and @Marc, you say that people don't want to wait for PGO to build when running ./configure && make, but why? Even though many developers use it, this mode is not intended for development, it is production level and should be run once (or at leas a limited number or times), when the developers are sure that everything is fine in the debug mode. As Victor previously said, we should have all the good stuff (PGO, LTO, etc) enabled by default, regardless the time needed to do it. I use it all the time in development: - For running math tests that would be too slow otherwise. - To diagnose invalid accesses that only occur with -O2. - To speed up Valgrind runs.
msg263386 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2016-04-14 09:07
On 14.04.2016 10:39, Alecsandru Patrascu wrote: > > @Stefan and @Marc, you say that people don't want to wait for PGO to build when running ./configure && make, but why? Even though many developers use it, this mode is not intended for development, it is production level and should be run once (or at leas a limited number or times), when the developers are sure that everything is fine in the debug mode. As Victor previously said, we should have all the good stuff (PGO, LTO, etc) enabled by default, regardless the time needed to do it. You need to compile Python a lot during Python development and here the compile speed matters, the performance of the resulting binary is secondary (as long as it is consistent). For production, it's easily possible to add those options to configure, plus it's not 100% clear whether all optimizations really do create correct code. We've had lots of issues with optimization errors in compilers in the past and have generally been rather conservative with the default optimization settings. It's better to have a stable running Python, than a Python that is fast at failing or creating wrong results ;-) I think having these extra options readily accessible and working is great, and people who know what they are doing can then use them for the benefit of getting an even faster Python. Distributors will know what they are doing, so many Python users will still be able to benefit from them.
msg263387 - (view)	Author: Stefan Krah (skrah) *	Date: 2016-04-14 09:13
On Thu, Apr 14, 2016 at 08:55:25AM +0000, Stefan Krah wrote: > I use it all the time in development: ... where "it" refers to "./configure && make", not to PGO.
msg263395 - (view)	Author: Alecsandru Patrascu (alecsandru.patrascu) *	Date: 2016-04-14 10:17
Maybe an workflow like the one proposed in issue #26359 can be helpful in these development phases.
msg263532 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-04-15 23:59
New changeset f16ec63055ad by Gregory P. Smith in branch '3.5': Issue #25702: A --with-lto configure option has been added that will https://hg.python.org/cpython/rev/f16ec63055ad New changeset 3103af76f4c4 by Gregory P. Smith in branch 'default': Issue #25702: A --with-lto configure option has been added that will https://hg.python.org/cpython/rev/3103af76f4c4
msg263534 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-04-16 00:16
What i committed for 3.5 and 3.6 matches lto-cpython3-v04.patch which just adds --with-lto support. 2.7 still needs to be patched. For reference: Using ubuntu's gcc 5.2.1 i was seeing a 2-3% performance increase in the resulting LTO binary vs a plain profile-opt PGO build. That'll vary based on arch and compiler toolchain.
msg266993 - (view)	Author: Roundup Robot (python-dev)	Date: 2016-06-02 23:44
New changeset f710dac07312 by Gregory P. Smith in branch '2.7': Issue #25702: A --with-lto configure option has been added that will https://hg.python.org/cpython/rev/f710dac07312
msg267007 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2016-06-03 00:27
the main part of this issue is done but it can't be closed until the dependencies listed are also dealt with. un-assigning myself.

History
Date	User	Action	Args
2022-04-11 14:58:24	admin	set	github: 69888
2021-07-29 07:37:04	methane	set	status: open -> closed stage: patch review -> resolved
2020-09-11 22:18:01	brett.cannon	set	nosy: - brett.cannon
2020-01-28 19:20:40	David Filiatrault	set	nosy: + David Filiatrault
2016-06-03 00:27:51	gregory.p.smith	set	assignee: gregory.p.smith -> resolution: fixed messages: + msg267007
2016-06-02 23:44:58	python-dev	set	messages: + msg266993
2016-04-17 06:30:12	gregory.p.smith	set	dependencies: + test_gdb fails all tests on a profile-opt build configured --with-lto
2016-04-17 06:21:50	gregory.p.smith	set	dependencies: + test_distutils fails when configured --with-lto
2016-04-16 00:16:25	gregory.p.smith	set	assignee: gregory.p.smith messages: + msg263534
2016-04-15 23:59:18	python-dev	set	nosy: + python-dev messages: + msg263532
2016-04-14 10:17:43	alecsandru.patrascu	set	messages: + msg263395
2016-04-14 09:13:39	skrah	set	messages: + msg263387
2016-04-14 09:07:32	lemburg	set	messages: + msg263386
2016-04-14 08:55:25	skrah	set	messages: + msg263385
2016-04-14 08:39:20	alecsandru.patrascu	set	messages: + msg263383
2016-04-13 19:57:45	pitrou	set	messages: + msg263358
2016-04-13 19:55:45	lemburg	set	messages: + msg263357
2016-04-13 19:53:38	vstinner	set	messages: + msg263356
2016-04-13 19:50:26	skrah	set	messages: + msg263355
2016-04-13 19:24:24	vstinner	set	nosy: + vstinner messages: + msg263354
2016-03-05 00:44:43	gregory.p.smith	set	messages: + msg261208
2016-03-05 00:22:05	methane	set	messages: + msg261205
2016-03-04 16:02:07	alecsandru.patrascu	set	messages: + msg261190
2016-03-04 15:34:47	methane	set	messages: + msg261189
2016-03-04 07:18:31	alecsandru.patrascu	set	messages: + msg261183
2016-03-04 01:32:42	methane	set	messages: + msg261181
2016-03-03 11:09:03	alecsandru.patrascu	set	messages: + msg261158
2016-03-03 10:15:10	methane	set	messages: + msg261155
2016-03-03 09:48:43	alecsandru.patrascu	set	messages: + msg261154
2016-03-03 08:46:28	methane	set	nosy: + methane messages: + msg261150
2016-01-27 12:58:16	alecsandru.patrascu	set	messages: + msg259019
2016-01-20 18:02:23	steve.dower	set	messages: + msg258703
2016-01-20 17:38:52	brett.cannon	set	nosy: + steve.dower messages: + msg258697
2016-01-20 13:57:16	r.david.murray	set	messages: + msg258682
2016-01-20 13:03:19	vstinner	set	nosy: - vstinner
2016-01-20 11:03:26	lemburg	set	messages: + msg258663
2016-01-20 10:38:15	pitrou	set	messages: + msg258660
2016-01-20 10:24:18	lemburg	set	nosy: + lemburg messages: + msg258658
2016-01-20 09:28:00	alecsandru.patrascu	set	files: + lto-cpython3-v04.patch
2016-01-20 09:27:49	alecsandru.patrascu	set	files: + lto-cpython2-v04.patch messages: + msg258655
2016-01-20 08:41:55	vstinner	set	messages: + msg258653
2016-01-20 08:37:58	pitrou	set	messages: + msg258651
2016-01-20 05:28:16	zach.ware	set	messages: + msg258642
2016-01-19 23:03:46	pitrou	set	messages: + msg258632
2016-01-19 22:54:42	r.david.murray	set	nosy: + r.david.murray
2016-01-19 22:22:53	zach.ware	set	nosy: + zach.ware messages: + msg258627
2016-01-04 15:19:46	alecsandru.patrascu	set	files: + lto-cpython3-v03.patch
2016-01-04 15:19:39	alecsandru.patrascu	set	files: + lto-cpython2-v03.patch messages: + msg257463
2015-12-26 17:46:36	alecsandru.patrascu	set	nosy: + brett.cannon, gregory.p.smith, scoder, vstinner, skrah messages: + msg257042
2015-11-23 14:47:04	pitrou	set	messages: + msg255168
2015-11-23 14:46:20	pitrou	set	messages: + msg255167
2015-11-23 14:44:27	alecsandru.patrascu	set	files: + lto-cpython3-v02.patch
2015-11-23 14:44:20	alecsandru.patrascu	set	files: + lto-cpython2-v02.patch
2015-11-23 14:44:11	alecsandru.patrascu	set	messages: + msg255165
2015-11-23 12:18:48	alecsandru.patrascu	set	messages: + msg255150
2015-11-23 11:36:30	pitrou	set	nosy: + pitrou messages: + msg255148 stage: patch review
2015-11-23 09:00:11	alecsandru.patrascu	set	files: + lto-cpython3-v01.patch
2015-11-23 09:00:03	alecsandru.patrascu	set	files: + lto-cpython2-v01.patch keywords: + patch
2015-11-23 08:59:40	alecsandru.patrascu	create