Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile Guided Optimization improvements (better training, llvm support, etc) #69103

Closed
alecsandrupatrascu mannequin opened this issue Aug 22, 2015 · 63 comments
Closed
Assignees
Labels
build The build process and cross-build performance Performance or resource usage

Comments

@alecsandrupatrascu
Copy link
Mannequin

alecsandrupatrascu mannequin commented Aug 22, 2015

BPO 24915
Nosy @smontanaro, @brettcannon, @gpshead, @pitrou, @scoder, @vstinner, @larryhastings, @skrah
Dependencies
  • bpo-25188: regrtest.py improvement for Profile Guided Optimization builds
  • Files
  • python2.7-pgo.patch: PGO patch for Python2.7
  • python3.6-pgo.patch: PGO patch for Python3.6
  • python2.7-pgo-v02.patch
  • python3.6-pgo-v02.patch
  • python2.7-pgo-v02-mac.patch
  • python3.6-pgo-v02-mac.patch
  • python2.7-pgo-v03.patch
  • python3.6-pgo-v03.patch
  • python2.7-pgo-v04.patch
  • python3.6-pgo-v04.patch
  • python2.7-pgo-v05.patch
  • python3.6-pgo-v05.patch
  • python2.7-pgo-v06.patch
  • python3.6-pgo-v06.patch
  • README.pgo
  • README2.7-pgo-v01.patch
  • README3.6-pgo-v01.patch
  • README2.7-pgo-v02.patch
  • README3.6-pgo-v02.patch
  • python2.7-pgo-v07.patch
  • python3.6-pgo-v07.patch
  • issue24915-python2.7.diff: Unified patch for Python 2.7
  • pgo.py
  • pgofix-cpython2.patch
  • pgofix-cpython3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/brettcannon'
    closed_at = <Date 2016-02-20.20:28:06.655>
    created_at = <Date 2015-08-22.14:41:42.207>
    labels = ['build', 'performance']
    title = 'Profile Guided Optimization improvements (better training, llvm support, etc)'
    updated_at = <Date 2016-02-20.20:28:06.653>
    user = 'https://bugs.python.org/alecsandrupatrascu'

    bugs.python.org fields:

    activity = <Date 2016-02-20.20:28:06.653>
    actor = 'brett.cannon'
    assignee = 'brett.cannon'
    closed = True
    closed_date = <Date 2016-02-20.20:28:06.655>
    closer = 'brett.cannon'
    components = ['Build']
    creation = <Date 2015-08-22.14:41:42.207>
    creator = 'alecsandru.patrascu'
    dependencies = ['25188']
    files = ['40231', '40232', '40236', '40237', '40238', '40239', '40254', '40255', '40267', '40268', '40273', '40274', '40288', '40289', '40297', '40299', '40300', '40389', '40390', '40391', '40392', '40441', '40522', '41916', '41917']
    hgrepos = []
    issue_num = 24915
    keywords = ['patch']
    message_count = 63.0
    messages = ['248988', '248992', '249002', '249004', '249006', '249008', '249013', '249014', '249052', '249053', '249055', '249061', '249071', '249128', '249131', '249143', '249146', '249155', '249200', '249227', '249246', '249256', '249286', '249315', '249333', '249336', '249355', '250008', '250011', '250093', '250095', '250096', '250099', '250100', '250102', '250105', '250489', '251033', '251034', '251035', '251036', '251065', '251090', '251091', '251096', '251110', '251112', '251125', '251126', '251127', '251128', '251129', '252179', '252182', '252184', '259840', '259868', '259870', '259878', '259881', '259883', '260269', '260573']
    nosy_count = 11.0
    nosy_names = ['skip.montanaro', 'brett.cannon', 'gregory.p.smith', 'tzot', 'pitrou', 'scoder', 'vstinner', 'larry', 'skrah', 'python-dev', 'alecsandru.patrascu']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue24915'
    versions = ['Python 2.7', 'Python 3.5', 'Python 3.6']

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 22, 2015

    Hi All,

    This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation.

    I would like to submit a request to turn-on Profile Guided Optimization or PGO as the default build option for Python (both 2.7 and 3.6), given its performance benefits on a wide variety of workloads and hardware. For instance, as shown from attached sample performance results from the Grand Unified Python Benchmark, >20% speed up was observed. In addition, we are seeing 2-9% performance boost from OpenStack/Swift where more than 60% of the codes are in Python 2.7. Our analysis indicates the performance gain was mainly due to reduction of icache misses and CPU front-end stalls.

    Attached is the Makefile patches that modify the all build target and adds a new one called "disable-profile-opt". We built and tested this patch for Python 2.7 and 3.6 on our Linux machines (CentOS 7/Ubuntu Server 14.04, Intel Xeon Haswell/Broadwell with 18/8 cores). We use "regrtest" suite for training as it provides the best performance improvement. Some of the test programs in the suite may fail which leads to build fail. One solution is to disable the specific failed test using the "-x " flag (as shown in the patch)

    Steps to apply the patch:

    1. hg clone https://hg.python.org/cpython cpython
    2. cd cpython
    3. hg update 2.7 (needed for 2.7 only)
    4. Copy *.patch to the current directory
    5. patch < python2.7-pgo.patch (or patch < python3.6-pgo.patch)
    6. ./configure
    7. make

    To disable PGO
    7b. make disable-profile-opt

    In the following, please find our sample performance results from latest XEON machine, XEON Broadwell EP.
    Hardware (HW): Intel XEON (Broadwell) 8 Cores

    BIOS settings: Intel Turbo Boost Technology: false
    Hyper-Threading: false

    Operating System: Ubuntu 14.04.3 LTS trusty

    OS configuration: CPU freq set at fixed: 2.6GHz by
    echo 2600000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
    echo 2600000 > /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
    Address Space Layout Randomization (ASLR) disabled (to reduce run to run variation) by
    echo 0 > /proc/sys/kernel/randomize_va_space

    GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

    Benchmark: Grand Unified Python Benchmark (GUPB)
    GUPB Source: https://hg.python.org/benchmarks/

    Python2.7 results:
    Python source: hg clone https://hg.python.org/cpython cpython
    Python Source: hg update 2.7
    hg id: 0511b1165bb6 (2.7)
    hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
    hg --debug id -i: 0511b1165bb6cf40ada0768a7efc7ba89316f6a5

        Benchmarks          Speedup(%)
        simple_logging      20
        raytrace            20
        silent_logging      19
        richards            19
        chaos               16
        formatted_logging   16
        json_dump           15
        hexiom2             13
        pidigits            12
        slowunpickle        12
        django_v2           12
        unpack_sequence     11
        float               11
        mako                11
        slowpickle          11
        fastpickle          11
        django              11
        go                  10
        json_dump_v2        10
        pathlib             10
        regex_compile       10
        pybench             9.9
        etree_process       9
        regex_v8            8
        bzr_startup         8
        2to3                8
        slowspitfire        8
        telco               8
        pickle_list         8
        fannkuch            8
        etree_iterparse     8
        nqueens             8
        mako_v2             8
        etree_generate      8
        call_method_slots   7
        html5lib_warmup     7
        html5lib            7
        nbody               7
        spectral_norm       7
        spambayes           7
        fastunpickle        6
        meteor_contest      6
        chameleon           6
        rietveld            6
        tornado_http        5
        unpickle_list       5
        pickle_dict         4
        regex_effbot        3
        normal_startup      3
        startup_nosite      3
        etree_parse         2
        call_method_unknown 2
        call_simple         1
        json_load           1
        call_method         1
    

    Python3.6 results
    Python source: hg clone https://hg.python.org/cpython cpython
    hg id: 96d016f78726 tip
    hg id -r 'ancestors(.) and tag()': 1a58b1227501 (3.5) v3.5.0rc1
    hg --debug id -i: 96d016f78726afbf66d396f084b291ea43792af1

        Benchmark           Speedup(%)
        fastunpickle        22.94
        fastpickle          21.67
        json_load           17.64
        simple_logging      17.49
        meteor_contest      16.67
        formatted_logging   15.33
        etree_process       14.61
        raytrace            13.57
        etree_generate      13.56
        chaos               12.09
        hexiom2             12
        nbody               11.88
        json_dump_v2        11.24
        richards            11.02
        nqueens             10.96
        fannkuch            10.79
        go                  10.77
        float               10.26
        regex_compile       9.8
        silent_logging      9.63
        pidigits            9.58
        etree_iterparse     9.48
        2to3                8.44
        regex_v8            8.09
        regex_effbot        7.88
        call_simple         7.63
        tornado_http        7.38
        etree_parse         4.92
        spectral_norm       4.72
        normal_startup      4.39
        telco               3.88
        startup_nosite      3.7
        call_method         3.63
        unpack_sequence     3.6
        call_method_slots   2.91
        call_method_unknown 2.59
        iterative_count     0.45
        threaded_count      -2.79
    

    Thank you,
    Alecsandru

    @alecsandrupatrascu alecsandrupatrascu mannequin added build The build process and cross-build performance Performance or resource usage labels Aug 22, 2015
    @scoder
    Copy link
    Contributor

    scoder commented Aug 22, 2015

    Please upload your patches as separate, uncompressed files for review.

    PGO was already proposed here, but nothing came out of it:

    https://bugs.python.org/issue17781

    I suggest rejecting that old ticket and sticking with this one since it has an actual patch.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 23, 2015

    I added the patches as individual files and removed the zip file.

    @smontanaro
    Copy link
    Contributor

    Is this supposed to work on Macs using Apple's version of gcc? I've got the latest version of Yosemite and XCode, and am getting these warnings when trying to build 2.7:

    clang: warning: argument unused during compilation: '-fprofile-generate'

    Should this be enabled using a configure check? Perhaps gcc/clang supports this but spells the feature differently. gcc --help tells me:

    % gcc --help | egrep profile
    -fprofile-instr-generate
    -fprofile-instr-use=<value>
    Use instrumentation data for profile-guided optimization
    -fprofile-sample-use=<value>
    Enable sample-based profile guided optimizations

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 23, 2015

    The patches are tested on Linux machines, with GNU GCC >4.8.3. From your output I see that you are using the CLANG compiler. CLANG uses a different set of flags for PGO that are not compatible with GCC's, therefore the compilation will fail. Can you please use the GNU GCC compiler to test the patches?

    @smontanaro
    Copy link
    Contributor

    It is executed using the gcc command:

    % gcc -c -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fprofile-generate -I. -IInclude -I./Include -DPy_BUILD_CORE -o Modules/gcmodule.o Modules/gcmodule.c
    clang: warning: argument unused during compilation: '-fprofile-generate'
    % type gcc
    gcc is /usr/bin/gcc

    I have no idea if you can even use something other than clang on Macs now. In any case, the default compiler should work to build Python out of the box, if necessary, by checking things during configure.

    @brettcannon
    Copy link
    Member

    I did an initial code review on the 3.6 patch.

    What would it take to add clang support for PGO? Is it simply using different flags that configure can set in the generated Makefile? Or is it more involved and would require maintaining two separate compile lines in the Makefile?

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 23, 2015

    I received the review and will post new patch versions as soon as I update them.

    Regarding PGO on clang, I will need a bit more time to edit the Makefile and will post it just for clang, to be easier for us to see the differences.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 24, 2015

    I modified the patches after the review made by Brett (python2.7-pgo-v02.patch and python3.6-pgo-v02.patch):

    • removed the call to pybench
    • left the PGO steps as optional. To use it we run "make profile-opt"
    • in the initial patches, I left out test_hashlib because in our benchmarks we did not see any gain by applying PGO to the hash functions. It is not harmful and we can let it there or skip it. Nevertheless, in order not to create confusions about it, I removed that parameter from the patch.

    I also added the patches for Mac exclusively (python2.7-pgo-v02-mac.patch and python3.6-pgo-v02-mac.patch). You must have llvm-profdata installed and in your path (in /Library/Developer/CommandLineTools/usr/bin/) to use it. I tested on Yosemite and it compiles fine with clang. I am working on a generic version (configure and Makefile patches) to merge all these platforms and will post them as soon as it is done.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 24, 2015

    My initial reaction is that the default should be optimized for
    short build times. I would not want to type "disable-profile-opt"
    every time I'm running the tests.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Aug 24, 2015

    I see that your latest patch leaves PGO as an option, so
    please ignore my previous comment.

    @brettcannon
    Copy link
    Member

    the v02 patches LGTM. I'm fine with seeing those committed as-is knowing Alecsandru is actively working towards Clang support.

    @gpshead
    Copy link
    Member

    gpshead commented Aug 24, 2015

    i'm updating the title to be more accurate.

    turning it on by default is likely not desirable as the makefile is primarily used by developers who are iterating on changes.

    but having it use a good workload (regrtest) and work with llvm and os x are good. :)

    @gpshead gpshead changed the title Profile Guided Optimization active by-default Profile Guided Optimization improvements (better training, llvm support, etc) Aug 24, 2015
    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 25, 2015

    I modified the patches to be compatible with both environments. The new versions modify the configure.ac file also, therefore you will need to run "autoconf" by hand. Also, in case of MaOS you will need to have llvm-profdata installed and in your path.

    I kept the expanded form of regrtest (/Lib/test/regrtest.py) because this way it is clearer to the user what is the main file that runs the training workload.

    Also, the "|| true" is necessary also, due to the nature of regrtest. This test suite is designed to return a fail code if a test is not ok, even for tests that do not comply with certain dependencies (meaning users that didn't installed any other libraries).

    @brettcannon
    Copy link
    Member

    Any specific reason the v3 patch, Alecsandru, is listed as against 3.5 in the filename? Or is that just a typo?

    P.S.: I did another review asking about explicit Clang support and also supporting Greg's request to use -m test instead of the explicit file execution.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 25, 2015

    Sorry, it was a typo. I made a correction to it. I will also modify to -m flag, instead of the explicit file execution.

    Regarding the clang/gcc support, in v03 version of patches, GCC is supported. On Linux is straightforward. On Mac I see that the default development environment also has the "gcc" command, but it is a binary stub that calls clang in backend, so the flags are adjusted for clang-in-gcc-clothing. You say to support clang explicitly as a compiler in 2.7 and 3.6?

    @brettcannon
    Copy link
    Member

    I'm asking if that's possible. For instance I set $CC to clang explicitly on OS X as I install the latest version of LLVM through Homebrew to get better compiler warnings for Python. It would be great if we could avoid leaving all clang users out unless they happen to use the stock install on OS X (e.g. cover Clang users on Linux).

    Basically it would be nice if this is not exclusive to gcc if Clang also supports PGO.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 25, 2015

    Thank you for the clarifications! Your point make sense, we don't want to exclude clang environments. I will analyze this and post some patches once I'm done with it.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 26, 2015

    I modified the patches with clang support.

    Also, I added an important check for the architecture on which PGO is running. Our proposal targets x86 platforms, since our measurements are made only on x86 hardware.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 27, 2015

    I fixed the files after the review. Regarding the PROFILE_OPT_OLD line, I think that it is better to keep also the old task used for PGO, until clear evidence and measurements that regrtest is performing better on other architectures exists.

    @pitrou
    Copy link
    Member

    pitrou commented Aug 27, 2015

    Can you explain what the profile-merging thing is achieving?

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 27, 2015

    The profile merging is necessary in case you want to use a pure clang compiler or you use GCC in OSX. For example, a general profiling action using clang will result in at least one binary profile. For our case, when using regrtest, we will have multiple profiles as the test is a multi-process one. The application llvm-profdata has the ability to merge the information collected from multiple processes, thus having a more precise map of what is executed from the profiled application.

    This step is mandatory even if we train on a single threaded or single process workload and have just one profile. More information about the entire process can be found here: http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation

    @brettcannon
    Copy link
    Member

    I did another round of review. I noticed that the configure part of the patch is missing and that .hgignore and .gitignore should get updated to ignore the profile files. Otherwise the only other comment was making an echoed comment a bit clearer.

    And in case anyone else is on OS X Yosemite and gets an error about llvm-profdata missing, make sure that /Library/Developer/CommandLineTools/usr/bin is on your $PATH.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 29, 2015

    I've updated the patches after review and implemented the checkup for llvm-profdata for both Linux and OSX.

    @smontanaro
    Copy link
    Contributor

    The latest patch worked fine for me (Mac OS X Yosemite). I've only tried with 2.7 so far. The only thing that was a bit mystifying were the errors during the initial profile run. There is so much that floats by in the terminal window that I completely missed the warnings about errors during the test run not being anything to worry about. I only noticed the messages when I took a look at the patch more closely.

    Perhaps it would be worthwhile to add a short bit about the profile-opt target and its requirements to the README file.

    @smontanaro
    Copy link
    Contributor

    Not knowing a darn thing about this, I went ahead and made a provisional change to the README file.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Aug 30, 2015

    That's a good point Skip. I added another set of patches, just for the README files, explaining the entire procedure, so now anyone reading it will see that PGO is available, what are the steps involved and a brief comment about the warning.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 18, 2015

    Thank you Brett for committing this.

    @vstinner
    Copy link
    Member

    Hum, the change 7fcff838d09e broke the buildbot "AMD64 Debian PGO 3.5". It would nice to add Clang support without loosing GCC support :-D

    http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274

    @vstinner vstinner reopened this Sep 19, 2015
    @brettcannon
    Copy link
    Member

    It didn't break gcc, the buildbot simply wasn't patient enough for the PGO run of the test suite to complete: http://buildbot.python.org/all/builders/AMD64%20Debian%20PGO%203.5/builds/274/steps/compile/logs/stdio . It takes a good amount of time to run the test suite serially with an instrumented interpreter and 20 minutes is not enough time. And I don't want to add output back simply to appease the buildbot as the output means nothing to a user who is doing the build themselves.

    So either that buildbot needs to allow for a longer time without output, someone needs to come up with a way to simply emit some output that simply shows stuff is running (but without letting error condition stuff show up), or the buildbot just won't work with PGO.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 19, 2015

    Le 19/09/2015 20:18, Brett Cannon a écrit :

    And I don't
    want to add output back simply to appease the buildbot as the output
    means nothing to a user who is doing the build themselves.

    The output is actually a good indication of progress, so I don't think
    it's not as silly to add it back as you seem to think it is :-)

    @brettcannon
    Copy link
    Member

    The problem with the output is that error cases are unimportant and yet it fooled Skip into temporarily caring until he finally noticed the warning message. So my worry is that someone doesn't notice the "NOTE: ignore errors as they don't affect anything" and then glances at the output to notice an error and then worries that their PGO run failed.

    It people really want to add output back in, though, they will need to patch both the Makefile to have a big NOTE in it as well as the README to say that any errors during the test suite run are unimportant and do not affect the outcome of the profile-guided optimizations.

    @smontanaro
    Copy link
    Contributor

    Would it be possible to grep out the warning messages, but let everything
    else through?
    On Sep 19, 2015 1:34 PM, "Brett Cannon" <report@bugs.python.org> wrote:

    Brett Cannon added the comment:

    The problem with the output is that error cases are unimportant and yet it
    fooled Skip into temporarily caring until he finally noticed the warning
    message. So my worry is that someone doesn't notice the "NOTE: ignore
    errors as they don't affect anything" and then glances at the output to
    notice an error and then worries that their PGO run failed.

    It people really want to add output back in, though, they will need to
    patch both the Makefile to have a big NOTE in it as well as the README to
    say that any errors during the test suite run are unimportant and do not
    affect the outcome of the profile-guided optimizations.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue24915\>


    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Sep 19, 2015

    Thank you for upstreaming this in both branches of Python!
    Do you think that a different version of regrtest.py, that will be used only for PGO training, should be better in this case? I mean, by implementing a custom version, I think we can control better the output and errors shown on screen.

    @brettcannon
    Copy link
    Member

    I gave the custom test runner a try using unittest's discovery facility, but it started to execute the whole test suite again, so it's a bit more complicated than you might think (I guess it imported regrtest or something?).

    @pitrou
    Copy link
    Member

    pitrou commented Sep 19, 2015

    Instead of writing a custom test runner from scratch, I would suggest adding a hidden --option to regrtest that would disable reporting errors.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Sep 19, 2015

    I can work on modifying the existing regrtest and adding a distinct flag, --pgo for example, as Antoine suggested. Indeed, it will not be trivial as regrtest has a dual approach (single process and multi process), but I will give it a try and post a patch as soon as possible.

    I also suggest that I open a new issue for this case as it is somehow a distinct implementation than pure PGO and definitively will be some iterations on regrtest.py for both versions of Python until we reach a common ground. It is ok for everyone?

    @pitrou
    Copy link
    Member

    pitrou commented Sep 20, 2015

    I think the --pgo flag needs only work in single process mode, since
    multi-process would probably not write out the profiling data properly.

    @brettcannon
    Copy link
    Member

    A separate issue is fine, Alecsandru, since we can make it a dependency of this issue.

    @brettcannon
    Copy link
    Member

    regrtest changes are now in 2.7, 3.5, and default in spite of Victor changing everything underneath me constantly in default. =) That should make the buildbot happy again.

    @vstinner
    Copy link
    Member

    vstinner commented Oct 3, 2015

    regrtest changes are now in 2.7, 3.5, and default in spite of Victor changing everything underneath me constantly in default. =) That should make the buildbot happy again.

    Yeah sorry, it was my regrtest week :-)

    @brettcannon
    Copy link
    Member

    It's fine. =) Glad it was for good reasons. Just took quite a while to manually apply the old patch to the new layout and then fix merges.

    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Feb 8, 2016

    Perhaps I'm missing something obvious here, but…

    …
    $(MAKE) build_all_merge_profile
    @echo "Rebuilding with profile guided optimizations:"
    $(MAKE) clean
    $(MAKE) build_all_use_profile
    …
    

    the $(MAKE) clean does an rm -rf build, so it also removes the .gcda for the builtin modules.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Feb 8, 2016

    Hello and thank you for your feedback. For CPython this does not apply because due to the structure of the build system, inside the "build" directory there are no PGO profiles saved. You can run find . -name '*.gc??' to see.

    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Feb 8, 2016

    There are. (Check issue bpo-26307 that explains this cpio file. This is a x32 build of Python, because the memory savings are very welcome for the multiple worker processes of a project I work on.)

    $ cpio -it <_modules.gcda.cpio
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_multiprocessing/semaphore.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_multiprocessing/multiprocessing.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_posixsubprocess.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_struct.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_heapqmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_opcode.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/binascii.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_ssl.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/arraymodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_pickle.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_randommodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_hashopenssl.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_lzmamodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/grpmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/socketmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/selectmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_math.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/mathmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_datetimemodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/resource.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/zlibmodule.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/unicodedata.gcda
    build/temp.linux-x86_64-3.5/opt/x32/src/Python-3.5.1/Modules/_testcapimodule.gcda

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Feb 8, 2016

    That's interesting. Even on CPython3, I still don't see any gcda's inside the build directory, nor the tree structure you are seeing there. Can you please give me a couple of details regarding your environment (os, distribution, gcc version, 32/64 bit, cross compilation, etc)?

    If this is a general issue, I can add another patch to fix it.

    @tzot
    Copy link
    Mannequin

    tzot mannequin commented Feb 8, 2016

    First, let's make sure we're on the same page.

    • These files are created during the $(MAKE) run_profile_task stage.
    • They get removed during the $(MAKE) clean stage, along with the build directory.
    • The build directory gets recreated during the $(MAKE) build_all_use_profile stage, without any .gcda files this time.

    So you won't see these files after a successful build if you haven't taken measures to ensure they are saved during the build process. I save these files to a cpio file before make clean runs and restore them right afterwards.
    I suggest you modify Makefile.pre.in to include similar commands to the ones I mention in issue bpo-23607 to verify whether your system creates these files or not.

    Now, for the info you required:

    It's a system running Ubuntu 14.04 64-bit with gcc 4.8.4. It's a build of Python with the -mx32 flag, along with all required libraries for the needs of a specific project. (I think that the -mx32 flag is not important to our discussion here though.) It isn't a cross-compilation.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Feb 8, 2016

    For my responses, I modified locally the Makefile so that it will not remove the build directory and any of the gcda files. I will make some more tests tomorrow, but i think that this problem will solve simpler if the removal of the build directory is deleted and all the profile info is kept.

    @alecsandrupatrascu
    Copy link
    Mannequin Author

    alecsandrupatrascu mannequin commented Feb 14, 2016

    I've added a fix for the PGO builds after a issue pointed out in bpo-26307. Thank you Christos for your observation!

    @alecsandrupatrascu alecsandrupatrascu mannequin reopened this Feb 14, 2016
    @brettcannon
    Copy link
    Member

    Please add the fix to the issue that reported the problem so that the fix can be tracked with the bug report.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    build The build process and cross-build performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants