Title: Profile Guided Optimization improvements (better training, llvm support, etc)
msg248988 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-22 14:41
Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation.

I would like to submit a request to turn-on Profile Guided Optimization or PGO as the default build option for Python (both 2.7 and 3.6), given its performance benefits on a wide variety of workloads and hardware.  For instance, as shown from attached sample performance results from the Grand Unified Python Benchmark, >20% speed up was observed.  In addition, we are seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the codes are in Python 2.7. Our analysis indicates the performance gain was mainly due to reduction of icache misses and CPU front-end stalls.

Attached is the Makefile patches that modify the all build target and adds a new one called "disable-profile-opt". We built and tested this patch for Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon Haswell/Broadwell with 18/8 cores).  We use "regrtest" suite for training as it provides the best performance improvement.  Some of the test programs in the suite may fail which leads to build fail.  One solution is to disable the specific failed test using the "-x " flag (as shown in the patch)

Steps to apply the patch: 
1.  hg clone cpython 
2.  cd cpython 
3.  hg update 2.7 (needed for 2.7 only) 
4.  Copy *.patch to the current directory 
5.  patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
6.  ./configure 
7.  make

To disable PGO
7b. make disable-profile-opt

In the following, please find our sample performance results from latest XEON machine, XEON Broadwell EP.  
Hardware (HW):      Intel XEON (Broadwell) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false
Operating System:   Ubuntu 14.04.3 LTS trusty

OS configuration:   CPU freq set at fixed: 2.6GHz by
                        echo 2600000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
                        echo 2600000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
                    Address Space Layout Randomization (ASLR) disabled (to reduce run to run variation) by
                        echo 0 > /proc/sys/kernel/randomize_va_space
GCC version:        gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Benchmark:          Grand Unified Python Benchmark (GUPB)
                    GUPB Source:                    

Python2.7 results:
    Python source: hg clone cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5
        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib            7
        nbody               7
        spectral_norm       7
        spambayes           7
        fastunpickle        6
        meteor_contest      6
        chameleon           6
        rietveld            6
        tornado_http        5
        unpickle_list       5
        pickle_dict         4
        regex_effbot        3
        normal_startup      3
        startup_nosite      3
        etree_parse         2
        call_method_unknown 2
        call_simple         1
        json_load           1
        call_method         1

Python3.6 results
    Python source: hg clone cpython    
    hg id: 96d016f78726 tip
    hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
    hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1

        Benchmark           Speedup(%)
        fastunpickle        22.94
        fastpickle          21.67
        json_load           17.64
        simple_logging      17.49
        meteor_contest      16.67
        formatted_logging   15.33
        etree_process       14.61
        raytrace            13.57
        etree_generate      13.56
        chaos               12.09
        hexiom2             12
        nbody               11.88
        json_dump_v2        11.24
        richards            11.02
        nqueens             10.96
        fannkuch            10.79
        go                  10.77
        float               10.26
        regex_compile       9.8
        silent_logging      9.63
        pidigits            9.58
        etree_iterparse     9.48
        2to3                8.44
        regex_v8            8.09
        regex_effbot        7.88
        call_simple         7.63
        tornado_http        7.38
        etree_parse         4.92
        spectral_norm       4.72
        normal_startup      4.39
        telco               3.88
        startup_nosite      3.7
        call_method         3.63
        unpack_sequence     3.6
        call_method_slots   2.91
        call_method_unknown 2.59
        iterative_count     0.45
        threaded_count      -2.79

Thank you,
msg248992 - Author: Stefan Behnel (scoder) Date: 2015-08-22 19:26
Please upload your patches as separate, uncompressed files for review.

PGO was already proposed here, but nothing came out of it:

I suggest rejecting that old ticket and sticking with this one since it has an actual patch.
msg249002 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-23 06:56
I added the patches as individual files and removed the zip file.
msg249004 - Author: Skip Montanaro (skip.montanaro) Date: 2015-08-23 13:19
Is this supposed to work on Macs using Apple's version of gcc? I've got the latest version of Yosemite and XCode, and am getting these warnings when trying to build 2.7:

clang: warning: argument unused during compilation: '-fprofile-generate'

Should this be enabled using a configure check? Perhaps gcc/clang supports this but spells the feature differently. gcc --help tells me:

% gcc --help | egrep profile
                          Use instrumentation data for profile-guided optimization
                          Enable sample-based profile guided optimizations
msg249006 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-23 15:00
The patches are tested on Linux machines, with GNU GCC >4.8.3. From your output I see that you are using the CLANG compiler. CLANG uses a different set of flags for PGO that are not compatible with GCC's, therefore the compilation will fail. Can you please use the GNU GCC compiler to test the patches?
msg249008 - Author: Skip Montanaro (skip.montanaro) Date: 2015-08-23 15:03
It is executed using the gcc command:

% gcc -c -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes  -fprofile-generate -I. -IInclude -I./Include   -DPy_BUILD_CORE -o Modules/gcmodule.o Modules/gcmodule.c
clang: warning: argument unused during compilation: '-fprofile-generate'
% type gcc
gcc is /usr/bin/gcc

I have no idea if you can even use something other than clang on Macs now. In any case, the default compiler should work to build Python out of the box, if necessary, by checking things during configure.
msg249013 - Author: Brett Cannon (brett.cannon) Date: 2015-08-23 17:34
I did an initial code review on the 3.6 patch.

What would it take to add clang support for PGO? Is it simply using different flags that configure can set in the generated Makefile? Or is it more involved and would require maintaining two separate compile lines in the Makefile?
msg249014 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-23 18:23
I received the review and will post new patch versions as soon as I update them. 

Regarding PGO on clang, I will need a bit more time to edit the Makefile and will post it just for clang, to be easier for us to see the differences.
msg249052 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-24 14:45
I modified the patches after the review made by Brett (python2.7-pgo-v02.patch and python3.6-pgo-v02.patch):
- removed the call to pybench
- left the PGO steps as optional. To use it we run "make profile-opt"
- in the initial patches, I left out test_hashlib  because in our benchmarks we did not see any gain by applying PGO to the hash functions. It is not harmful and we can let it there or skip it. Nevertheless, in order not to create confusions about it, I removed that parameter from the patch.

I also added the patches for Mac exclusively (python2.7-pgo-v02-mac.patch and python3.6-pgo-v02-mac.patch). You must have llvm-profdata installed and in your path (in /Library/Developer/CommandLineTools/usr/bin/) to use it. I tested on Yosemite and it compiles fine with clang. I am working on a generic version (configure and Makefile patches) to merge all these platforms and will post them as soon as it is done.
msg249053 - Author: Stefan Krah (skrah) Date: 2015-08-24 14:53
My initial reaction is that the default should be optimized for
short build times. I would not want to type "disable-profile-opt"
every time I'm running the tests.
msg249055 - Author: Stefan Krah (skrah) Date: 2015-08-24 14:55
I see that your latest patch leaves PGO as an option, so
please ignore my previous comment.
msg249061 - Author: Brett Cannon (brett.cannon) Date: 2015-08-24 16:27
the v02 patches LGTM. I'm fine with seeing those committed as-is knowing Alecsandru is actively working towards Clang support.
msg249071 - Author: Gregory P. Smith (gregory.p.smith) Date: 2015-08-24 20:01
i'm updating the title to be more accurate.

turning it on by default is likely not desirable as the makefile is primarily used by developers who are iterating on changes.

but having it use a good workload (regrtest) and work with llvm and os x are good. :)
msg249128 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-25 15:43
I modified the patches to be compatible with both environments. The new versions modify the file also, therefore you will need to run "autoconf" by hand. Also, in case of MaOS you will need to have llvm-profdata installed and in your path.

I kept the expanded form of regrtest (/Lib/test/ because this way it is clearer to the user what is the main file that runs the training workload. 

Also, the "|| true" is necessary also, due to the nature of regrtest. This test suite is designed to return a fail code if a test is not ok, even for tests that do not comply with certain dependencies (meaning users that didn't installed any other libraries).
msg249131 - Author: Brett Cannon (brett.cannon) Date: 2015-08-25 15:57
Any specific reason the v3 patch, Alecsandru, is listed as against 3.5 in the filename? Or is that just a typo?

P.S.: I did another review asking about explicit Clang support and also supporting Greg's request to use `-m test` instead of the explicit file execution.
msg249143 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-25 17:34
Sorry, it was a typo. I made a correction to it. I will also modify to -m flag, instead of the explicit file execution. 

Regarding the clang/gcc support, in v03 version of patches, GCC is supported. On Linux is straightforward. On Mac I see that the default development environment also has the "gcc" command, but it is a binary stub that calls clang in backend, so the flags are adjusted for clang-in-gcc-clothing. You say to support clang explicitly as a compiler in 2.7 and 3.6?
msg249146 - Author: Brett Cannon (brett.cannon) Date: 2015-08-25 18:19
I'm asking if that's possible. For instance I set $CC to clang explicitly on OS X as I install the latest version of LLVM through Homebrew to get better compiler warnings for Python. It would be great if we could avoid leaving all clang users out unless they happen to use the stock install on OS X (e.g. cover Clang users on Linux).

Basically it would be nice if this is not exclusive to gcc if Clang also supports PGO.
msg249155 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-25 19:15
Thank you for the clarifications! Your point make sense, we don't want to exclude clang environments. I will analyze this and post some patches once I'm done with it.
msg249200 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-26 14:41
I modified the patches with clang support.

Also, I added an important check for the architecture on which PGO is running. Our proposal targets x86 platforms, since our measurements are made only on x86 hardware.
msg249227 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-27 09:12
I fixed the files after the review. Regarding the PROFILE_OPT_OLD line, I think that it is better to keep also the old task used for PGO, until clear evidence and measurements that regrtest is performing better on other architectures exists.
msg249246 - Author: Antoine Pitrou (pitrou) Date: 2015-08-27 19:00
Can you explain what the profile-merging thing is achieving?
msg249256 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-27 20:00
The profile merging is necessary in case you want to use a pure clang compiler or you use GCC in OSX. For example, a general profiling action using clang will result in at least one binary profile. For our case, when using regrtest, we will have multiple profiles as the test is a multi-process one. The application llvm-profdata has the ability to merge the information collected from multiple processes, thus having a more precise map of what is executed from the profiled application. 

This step is mandatory even if we train on a single threaded or single process workload and have just one profile. More information about the entire process can be found here:
msg249286 - Author: Brett Cannon (brett.cannon) Date: 2015-08-28 18:35
I did another round of review. I noticed that the configure part of the patch is missing and that .hgignore and .gitignore should get updated to ignore the profile files. Otherwise the only other comment was making an echoed comment a bit clearer.

And in case anyone else is on OS X Yosemite and gets an error about llvm-profdata missing, make sure that /Library/Developer/CommandLineTools/usr/bin is on your $PATH.
msg249315 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-29 08:44
I've updated the patches after review and implemented the checkup for llvm-profdata for both Linux and OSX.
msg249333 - Author: Skip Montanaro (skip.montanaro) Date: 2015-08-29 20:21
The latest patch worked fine for me (Mac OS X Yosemite). I've only tried with 2.7 so far. The only thing that was a bit mystifying were the errors during the initial profile run. There is so much that floats by in the terminal window that I completely missed the warnings about errors during the test run not being anything to worry about. I only noticed the messages when I took a look at the patch more closely.

Perhaps it would be worthwhile to add a short bit about the profile-opt target and its requirements to the README file.
msg249336 - Author: Skip Montanaro (skip.montanaro) Date: 2015-08-29 20:45
Not knowing a darn thing about this, I went ahead and made a provisional change to the README file.
msg249355 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-08-30 08:30
That's a good point Skip. I added another set of patches, just for the README files, explaining the entire procedure, so now anyone reading it will see that PGO is available, what are the steps involved and a brief comment about the warning.
msg250008 - Author: Stefan Behnel (scoder) Date: 2015-09-06 18:57
> The only thing that was a bit mystifying were the errors during the
> initial profile run. There is so much that floats by in the terminal
> window that I completely missed the warnings about errors during the
> test run not being anything to worry about.

Then wouldn't it be better to suppress (or at least reduce) the output of
the test runs in this case?
msg250011 - Author: Brett Cannon (brett.cannon) Date: 2015-09-06 19:05
I guess the test output -- both stdout and stderr -- could be redirected to /dev/null as simply using -q with regrtest will still lead to failures being emitted and random output which no one cares about except people inspecting the test output. Just need to make sure to mention that all output is suppressed so people don't think the process is hanging.
msg250093 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-09-07 10:11
I've updated the patches with redirect to /dev/null, as is it is more clearer to the user what is our intent, without having him to necessarily read the regrtest documentation. I've also added a warning message regarding the output and ported all these lines to 3.6 and to the README files also.
msg250095 - Author: Antoine Pitrou (pitrou) Date: 2015-09-07 10:39
Please don't call it "PROFILE_TASK_X86" - the architecture should have nothing to do with it. Actually, there shouldn't be any architecture-specific check at all.
msg250096 - Author: Antoine Pitrou (pitrou) Date: 2015-09-07 10:41
As for the dual 2.7/3.6 aspect: I don't really understand it. If this is committed to 2.7 it should also be committed to 3.5. It doesn't threaten the stability of the interpreter in any way, given it does not affect the default build path. There's no reason why packagers of Python 3.5 should have to separately maintain a patch to have access to this improvement.
msg250099 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-09-07 11:21
I named this task PROFILE_TASK_X86 because it is rigorously tested and we have proven that regrtest performs better on this architecture. Until any other clear evidence and solid measurements that regrtest is performing better on other architectures exists, I'd keep it this way.

Even though this does not threaten the stability of the interpreter in any way, the dual aspect you mentioned appears because CPython 2 and 3 have slightly different makefile rule format. To create a common patch working cross-versions will create a very tangled Makefile. If you all agree that having an unified patch for both versions is acceptable, I will work on that.
msg250100 - Author: Stefan Krah (skrah) Date: 2015-09-07 11:36
I don't think we should provide any performance guarantees in the
Makefile.  +1 for not special-casing x86 (does it include amd64?).

As I understood, Antoine was not talking about a unified patch
but about applying the 3.6 patch to 3.5 right away.
msg250102 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-09-07 12:16
If you are talking just about the 3.6 patch, it is called this way to emphasize the fact that it is intended for the development branch. It is perfectly compatible with 3.5, therefore it is not needed for packagers to maintain two distinct versions. I've tested with: hg update 3.5 ; hg import --no-commit python3.6-pgo-v07.patch ; ./configure ; make profile-opt

I also renamed the profile task makefile name.
msg250105 - Author: Stefan Krah (skrah) Date: 2015-09-07 12:42
Just (hopefully) for extra clarity:  As you mentioned, the 3.6 patch
is perfect for 3.5, too.  The reason why 3.5 was brought up is to ask
Larry, our release manager, to allow it already for 3.5.

Technically it's an enhancement/new feature, but practically it
is zero risk and for PR reassons we should probably not "make 2.7
faster" before 3.5.
msg250489 - Author: Brett Cannon (brett.cannon) Date: 2015-09-11 18:47
Attached is what I plan to commit to Python 2.7 assuming everyone is happy with the outcome. I tweaked the echoed messages from Alecsandru's patch, pulled in the README changes, and dropped the x86 checks as Antoine and Stefan requested.

Assuming people are happy with the patch I will also apply it to Python 3.5 with the appropriate tweaks.
msg251033 - Author: Roundup Robot (python-dev) Date: 2015-09-18 22:11
New changeset 0f4e6c303531 by Brett Cannon in branch '2.7':
Issue #24915: Make PGO builds support Clang and use the test suite for

New changeset f211c8f554f9 by Brett Cannon in branch '2.7':
Give proper credit for issue #24915
msg251034 - Author: Roundup Robot (python-dev) Date: 2015-09-18 22:17
New changeset 7fcff838d09e by Brett Cannon in branch '3.5':
Issue #24915: Add Clang support to PGO builds and use the test suite

New changeset 7749fc0a5ea6 by Brett Cannon in branch 'default':
Merge for issue #24915
msg251035 - Author: Brett Cannon (brett.cannon) Date: 2015-09-18 22:18
Thanks to Alecsandru and Intel for the patches!
msg251036 - Author: Antoine Pitrou (pitrou) Date: 2015-09-18 22:19
Thank you Brett for committing this.
msg251065 - Author: STINNER Victor (vstinner) Date: 2015-09-19 08:14
Hum, the change 7fcff838d09e broke the buildbot "AMD64 Debian PGO 3.5". It would nice to add Clang support without loosing GCC support :-D
msg251090 - Author: Brett Cannon (brett.cannon) Date: 2015-09-19 18:18
It didn't break gcc, the buildbot simply wasn't patient enough for the PGO run of the test suite to complete: . It takes a good amount of time to run the test suite serially with an instrumented interpreter and 20 minutes is not enough time. And I don't want to add output back simply to appease the buildbot as the output means nothing to a user who is doing the build themselves.

So either that buildbot needs to allow for a longer time without output, someone needs to come up with a way to simply emit some output that simply shows stuff is running (but without letting error condition stuff show up), or the buildbot just won't work with PGO.
msg251091 - Author: Antoine Pitrou (pitrou) Date: 2015-09-19 18:20
Le 19/09/2015 20:18, Brett Cannon a écrit :
> And I don't
want to add output back simply to appease the buildbot as the output
means nothing to a user who is doing the build themselves.

The output is actually a good indication of progress, so I don't think
it's not as silly to add it back as you seem to think it is :-)
msg251096 - Author: Brett Cannon (brett.cannon) Date: 2015-09-19 18:34
The problem with the output is that error cases are unimportant and yet it fooled Skip into temporarily caring until he finally noticed the warning message. So my worry is that someone doesn't notice the "NOTE: ignore errors as they don't affect anything" and then glances at the output to notice an error and then worries that their PGO run failed.

It people really want to add output back in, though, they will need to patch both the Makefile to have a big NOTE in it as well as the README to say that any errors during the test suite run are unimportant and do not affect the outcome of the profile-guided optimizations.
msg251110 - Author: Skip Montanaro (skip.montanaro) Date: 2015-09-19 19:43
Would it be possible to grep out the warning messages, but let everything
else through?
On Sep 19, 2015 1:34 PM, "Brett Cannon" <> wrote:

> Brett Cannon added the comment:
> The problem with the output is that error cases are unimportant and yet it
> fooled Skip into temporarily caring until he finally noticed the warning
> message. So my worry is that someone doesn't notice the "NOTE: ignore
> errors as they don't affect anything" and then glances at the output to
> notice an error and then worries that their PGO run failed.
> It people really want to add output back in, though, they will need to
> patch both the Makefile to have a big NOTE in it as well as the README to
> say that any errors during the test suite run are unimportant and do not
> affect the outcome of the profile-guided optimizations.
> ----------
> _______________________________________
> Python tracker <>
> <>
> _______________________________________
msg251112 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-09-19 19:50
Thank you for upstreaming this in both branches of Python!
Do you think that a different version of, that will be used only for PGO training, should be better in this case? I mean, by implementing a custom version, I think we can control better the output and errors shown on screen.
msg251125 - Author: Brett Cannon (brett.cannon) Date: 2015-09-19 23:20
I gave the custom test runner a try using unittest's discovery facility, but it started to execute the whole test suite again, so it's a bit more complicated than you might think (I guess it imported regrtest or something?).
msg251126 - Author: Antoine Pitrou (pitrou) Date: 2015-09-19 23:26
Instead of writing a custom test runner from scratch, I would suggest adding a hidden --option to regrtest that would disable reporting errors.
msg251127 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2015-09-19 23:58
I can work on modifying the existing regrtest and adding a distinct flag, --pgo for example, as Antoine suggested. Indeed, it will not be trivial as regrtest has a dual approach (single process and multi process), but I will give it a try and post a patch as soon as possible.

I also suggest that I open a new issue for this case as it is somehow a distinct implementation than pure PGO and definitively will be some iterations on for both versions of Python until we reach a common ground. It is ok for everyone?
msg251128 - Author: Antoine Pitrou (pitrou) Date: 2015-09-20 00:13
I think the --pgo flag needs only work in single process mode, since
multi-process would probably not write out the profiling data properly.
msg251129 - Author: Brett Cannon (brett.cannon) Date: 2015-09-20 00:25
A separate issue is fine, Alecsandru, since we can make it a dependency of this issue.
msg252179 - Author: Brett Cannon (brett.cannon) Date: 2015-10-02 23:24
regrtest changes are now in 2.7, 3.5, and default in spite of Victor changing everything underneath me constantly in default. =) That should make the buildbot happy again.
msg252182 - Author: STINNER Victor (vstinner) Date: 2015-10-03 00:02
> regrtest changes are now in 2.7, 3.5, and default in spite of Victor changing everything underneath me constantly in default. =) That should make the buildbot happy again.

Yeah sorry, it was my regrtest week :-)
msg252184 - Author: Brett Cannon (brett.cannon) Date: 2015-10-03 00:25
It's fine. =) Glad it was for good reasons. Just took quite a while to manually apply the old patch to the new layout and then fix merges.
msg259840 - Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) Date: 2016-02-08 12:42
Perhaps I'm missing something obvious here, but…

    $(MAKE) build_all_merge_profile
    @echo "Rebuilding with profile guided optimizations:"
    $(MAKE) clean
    $(MAKE) build_all_use_profile

the `$(MAKE) clean` does an `rm -rf build`, so it also removes the .gcda for the builtin modules.
msg259868 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2016-02-08 17:52
Hello and thank you for your feedback. For CPython this does not apply because due to the structure of the build system, inside the "build" directory there are no PGO profiles saved. You can run find . -name '*.gc??' to see.
msg259870 - Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) Date: 2016-02-08 18:10
There are. (Check issue #26307 that explains this cpio file. This is a x32 build of Python, because the memory savings are very welcome for the multiple worker processes of a project I work on.)

$ cpio -it <_modules.gcda.cpio
msg259878 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2016-02-08 19:45
That's interesting. Even on CPython3, I still don't see any gcda's inside the build directory, nor the tree structure you are seeing there. Can you please give me a couple of details regarding your environment (os, distribution, gcc version, 32/64 bit, cross compilation, etc)? 

If this is a general issue, I can add another patch to fix it.
msg259881 - Author: Χρήστος Γεωργίου (Christos Georgiou) (tzot) Date: 2016-02-08 20:16
First, let's make sure we're on the same page.

- These files are created during the `$(MAKE) run_profile_task` stage.
- They get removed during the `$(MAKE) clean` stage, along with the build directory.
- The build directory gets recreated during the `$(MAKE) build_all_use_profile` stage, without any .gcda files this time.

So you won't see these files after a successful build *if* you haven't taken measures to ensure they are saved during the build process. I save these files to a cpio file *before* `make clean` runs and restore them right afterwards.
I suggest you modify `` to include similar commands to the ones I mention in issue #23607 to verify whether your system creates these files or not.

Now, for the info you required:

It's a system running Ubuntu 14.04 64-bit with gcc 4.8.4. It's a build of Python with the `-mx32` flag, along with all required libraries for the needs of a specific project. (I think that the `-mx32` flag is not important to our discussion here though.) It isn't a cross-compilation.
msg259883 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2016-02-08 20:29
For my responses, I modified locally the Makefile so that it will not remove the build directory and any of the gcda files. I will make some more tests tomorrow, but i think that this problem will solve simpler if the removal of the build directory is deleted and all the profile info is kept.
msg260269 - Author: Alecsandru Patrascu (alecsandru.patrascu) Date: 2016-02-14 10:19
I've added a fix for the PGO builds after a issue pointed out in #26307. Thank you Christos for your observation!
msg260573 - Author: Brett Cannon (brett.cannon) Date: 2016-02-20 20:28
Please add the fix to the issue that reported the problem so that the fix can be tracked with the bug report.
