New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile Guided Optimization improvements (better training, llvm support, etc) #69103
Comments
Hi All, This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a request to turn-on Profile Guided Optimization or PGO as the default build option for Python (both 2.7 and 3.6), given its performance benefits on a wide variety of workloads and hardware. For instance, as shown from attached sample performance results from the Grand Unified Python Benchmark, >20% speed up was observed. In addition, we are seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the codes are in Python 2.7. Our analysis indicates the performance gain was mainly due to reduction of icache misses and CPU front-end stalls. Attached is the Makefile patches that modify the all build target and adds a new one called "disable-profile-opt". We built and tested this patch for Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon Haswell/Broadwell with 18/8 cores). We use "regrtest" suite for training as it provides the best performance improvement. Some of the test programs in the suite may fail which leads to build fail. One solution is to disable the specific failed test using the "-x " flag (as shown in the patch) Steps to apply the patch:
To disable PGO In the following, please find our sample performance results from latest XEON machine, XEON Broadwell EP. BIOS settings: Intel Turbo Boost Technology: false Operating System: Ubuntu 14.04.3 LTS trusty OS configuration: CPU freq set at fixed: 2.6GHz by GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04) Benchmark: Grand Unified Python Benchmark (GUPB) Python2.7 results:
Python3.6 results
Thank you, |
Please upload your patches as separate, uncompressed files for review. PGO was already proposed here, but nothing came out of it: https://bugs.python.org/issue17781 I suggest rejecting that old ticket and sticking with this one since it has an actual patch. |
I added the patches as individual files and removed the zip file. |
Is this supposed to work on Macs using Apple's version of gcc? I've got the latest version of Yosemite and XCode, and am getting these warnings when trying to build 2.7: clang: warning: argument unused during compilation: '-fprofile-generate' Should this be enabled using a configure check? Perhaps gcc/clang supports this but spells the feature differently. gcc --help tells me: % gcc --help | egrep profile |
The patches are tested on Linux machines, with GNU GCC >4.8.3. From your output I see that you are using the CLANG compiler. CLANG uses a different set of flags for PGO that are not compatible with GCC's, therefore the compilation will fail. Can you please use the GNU GCC compiler to test the patches? |
It is executed using the gcc command: % gcc -c -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fprofile-generate -I. -IInclude -I./Include -DPy_BUILD_CORE -o Modules/gcmodule.o Modules/gcmodule.c I have no idea if you can even use something other than clang on Macs now. In any case, the default compiler should work to build Python out of the box, if necessary, by checking things during configure. |
I did an initial code review on the 3.6 patch. What would it take to add clang support for PGO? Is it simply using different flags that configure can set in the generated Makefile? Or is it more involved and would require maintaining two separate compile lines in the Makefile? |
I received the review and will post new patch versions as soon as I update them. Regarding PGO on clang, I will need a bit more time to edit the Makefile and will post it just for clang, to be easier for us to see the differences. |
I modified the patches after the review made by Brett (python2.7-pgo-v02.patch and python3.6-pgo-v02.patch):
I also added the patches for Mac exclusively (python2.7-pgo-v02-mac.patch and python3.6-pgo-v02-mac.patch). You must have llvm-profdata installed and in your path (in /Library/Developer/CommandLineTools/usr/bin/) to use it. I tested on Yosemite and it compiles fine with clang. I am working on a generic version (configure and Makefile patches) to merge all these platforms and will post them as soon as it is done. |
My initial reaction is that the default should be optimized for |
I see that your latest patch leaves PGO as an option, so |
the v02 patches LGTM. I'm fine with seeing those committed as-is knowing Alecsandru is actively working towards Clang support. |
i'm updating the title to be more accurate. turning it on by default is likely not desirable as the makefile is primarily used by developers who are iterating on changes. but having it use a good workload (regrtest) and work with llvm and os x are good. :) |
I modified the patches to be compatible with both environments. The new versions modify the configure.ac file also, therefore you will need to run "autoconf" by hand. Also, in case of MaOS you will need to have llvm-profdata installed and in your path. I kept the expanded form of regrtest (/Lib/test/regrtest.py) because this way it is clearer to the user what is the main file that runs the training workload. Also, the "|| true" is necessary also, due to the nature of regrtest. This test suite is designed to return a fail code if a test is not ok, even for tests that do not comply with certain dependencies (meaning users that didn't installed any other libraries). |
Any specific reason the v3 patch, Alecsandru, is listed as against 3.5 in the filename? Or is that just a typo? P.S.: I did another review asking about explicit Clang support and also supporting Greg's request to use |
Sorry, it was a typo. I made a correction to it. I will also modify to -m flag, instead of the explicit file execution. Regarding the clang/gcc support, in v03 version of patches, GCC is supported. On Linux is straightforward. On Mac I see that the default development environment also has the "gcc" command, but it is a binary stub that calls clang in backend, so the flags are adjusted for clang-in-gcc-clothing. You say to support clang explicitly as a compiler in 2.7 and 3.6? |
I'm asking if that's possible. For instance I set $CC to clang explicitly on OS X as I install the latest version of LLVM through Homebrew to get better compiler warnings for Python. It would be great if we could avoid leaving all clang users out unless they happen to use the stock install on OS X (e.g. cover Clang users on Linux). Basically it would be nice if this is not exclusive to gcc if Clang also supports PGO. |
Thank you for the clarifications! Your point make sense, we don't want to exclude clang environments. I will analyze this and post some patches once I'm done with it. |
I modified the patches with clang support. Also, I added an important check for the architecture on which PGO is running. Our proposal targets x86 platforms, since our measurements are made only on x86 hardware. |
I fixed the files after the review. Regarding the PROFILE_OPT_OLD line, I think that it is better to keep also the old task used for PGO, until clear evidence and measurements that regrtest is performing better on other architectures exists. |
Can you explain what the profile-merging thing is achieving? |
The profile merging is necessary in case you want to use a pure clang compiler or you use GCC in OSX. For example, a general profiling action using clang will result in at least one binary profile. For our case, when using regrtest, we will have multiple profiles as the test is a multi-process one. The application llvm-profdata has the ability to merge the information collected from multiple processes, thus having a more precise map of what is executed from the profiled application. This step is mandatory even if we train on a single threaded or single process workload and have just one profile. More information about the entire process can be found here: http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation |
I did another round of review. I noticed that the configure part of the patch is missing and that .hgignore and .gitignore should get updated to ignore the profile files. Otherwise the only other comment was making an echoed comment a bit clearer. And in case anyone else is on OS X Yosemite and gets an error about llvm-profdata missing, make sure that /Library/Developer/CommandLineTools/usr/bin is on your $PATH. |
I've updated the patches after review and implemented the checkup for llvm-profdata for both Linux and OSX. |
The latest patch worked fine for me (Mac OS X Yosemite). I've only tried with 2.7 so far. The only thing that was a bit mystifying were the errors during the initial profile run. There is so much that floats by in the terminal window that I completely missed the warnings about errors during the test run not being anything to worry about. I only noticed the messages when I took a look at the patch more closely. Perhaps it would be worthwhile to add a short bit about the profile-opt target and its requirements to the README file. |
Not knowing a darn thing about this, I went ahead and made a provisional change to the README file. |
That's a good point Skip. I added another set of patches, just for the README files, explaining the entire procedure, so now anyone reading it will see that PGO is available, what are the steps involved and a brief comment about the warning. |
Thank you Brett for committing this. |
Hum, the change 7fcff838d09e broke the buildbot "AMD64 Debian PGO 3.5". It would nice to add Clang support without loosing GCC support :-D http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274 |
It didn't break gcc, the buildbot simply wasn't patient enough for the PGO run of the test suite to complete: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274/steps/compile/logs/stdio . It takes a good amount of time to run the test suite serially with an instrumented interpreter and 20 minutes is not enough time. And I don't want to add output back simply to appease the buildbot as the output means nothing to a user who is doing the build themselves. So either that buildbot needs to allow for a longer time without output, someone needs to come up with a way to simply emit some output that simply shows stuff is running (but without letting error condition stuff show up), or the buildbot just won't work with PGO. |
Le 19/09/2015 20:18, Brett Cannon a écrit :
The output is actually a good indication of progress, so I don't think |
The problem with the output is that error cases are unimportant and yet it fooled Skip into temporarily caring until he finally noticed the warning message. So my worry is that someone doesn't notice the "NOTE: ignore errors as they don't affect anything" and then glances at the output to notice an error and then worries that their PGO run failed. It people really want to add output back in, though, they will need to patch both the Makefile to have a big NOTE in it as well as the README to say that any errors during the test suite run are unimportant and do not affect the outcome of the profile-guided optimizations. |
Would it be possible to grep out the warning messages, but let everything
|
Thank you for upstreaming this in both branches of Python! |
I gave the custom test runner a try using unittest's discovery facility, but it started to execute the whole test suite again, so it's a bit more complicated than you might think (I guess it imported regrtest or something?). |
Instead of writing a custom test runner from scratch, I would suggest adding a hidden --option to regrtest that would disable reporting errors. |
I can work on modifying the existing regrtest and adding a distinct flag, --pgo for example, as Antoine suggested. Indeed, it will not be trivial as regrtest has a dual approach (single process and multi process), but I will give it a try and post a patch as soon as possible. I also suggest that I open a new issue for this case as it is somehow a distinct implementation than pure PGO and definitively will be some iterations on regrtest.py for both versions of Python until we reach a common ground. It is ok for everyone? |
I think the --pgo flag needs only work in single process mode, since |
A separate issue is fine, Alecsandru, since we can make it a dependency of this issue. |
regrtest changes are now in 2.7, 3.5, and default in spite of Victor changing everything underneath me constantly in default. =) That should make the buildbot happy again. |
Yeah sorry, it was my regrtest week :-) |
It's fine. =) Glad it was for good reasons. Just took quite a while to manually apply the old patch to the new layout and then fix merges. |
Perhaps I'm missing something obvious here, but…
the |
Hello and thank you for your feedback. For CPython this does not apply because due to the structure of the build system, inside the "build" directory there are no PGO profiles saved. You can run find . -name '*.gc??' to see. |
There are. (Check issue bpo-26307 that explains this cpio file. This is a x32 build of Python, because the memory savings are very welcome for the multiple worker processes of a project I work on.) $ cpio -it <_modules.gcda.cpio
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_multiprocessing/semaphore.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_multiprocessing/multiprocessing.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_posixsubprocess.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_struct.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_heapqmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_opcode.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/binascii.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_ssl.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/arraymodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_pickle.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_randommodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_hashopenssl.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_lzmamodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/grpmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/socketmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/selectmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_math.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/mathmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_datetimemodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/resource.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/zlibmodule.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/unicodedata.gcda
build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_testcapimodule.gcda |
That's interesting. Even on CPython3, I still don't see any gcda's inside the build directory, nor the tree structure you are seeing there. Can you please give me a couple of details regarding your environment (os, distribution, gcc version, 32/64 bit, cross compilation, etc)? If this is a general issue, I can add another patch to fix it. |
First, let's make sure we're on the same page.
So you won't see these files after a successful build if you haven't taken measures to ensure they are saved during the build process. I save these files to a cpio file before Now, for the info you required: It's a system running Ubuntu 14.04 64-bit with gcc 4.8.4. It's a build of Python with the |
For my responses, I modified locally the Makefile so that it will not remove the build directory and any of the gcda files. I will make some more tests tomorrow, but i think that this problem will solve simpler if the removal of the build directory is deleted and all the profile info is kept. |
I've added a fix for the PGO builds after a issue pointed out in bpo-26307. Thank you Christos for your observation! |
Please add the fix to the issue that reported the problem so that the fix can be tracked with the bug report. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: