This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Broad performance regression from 3.10a7 to 3.10b2 with python.org macOS binaries
Type: performance Stage: resolved
Components: Build Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ned.deily Nosy List: JelleZijlstra, andrenvk, brandtbucher, ned.deily, pablogsal, rhettinger, ronaldoussoren
Priority: critical Keywords: 3.10regression

Created on 2021-05-04 18:19 by rhettinger, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg392934 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-05-04 18:19
Running Tools/scripts/var_access_benchmark.py on the production macOS builds on python.org shows a performance drop-off between alpha-7 and beta-1.

Apple Silicon
-------------
read local            4.1ns ->  4.5ns
read non_local        4.1ns ->  4.6ns
read global           4.2ns ->  4.9ns
read builtin          4.2ns ->  5.3ns
classvar from cls    10.1ns -> 12.6ns
classvar from inst    9.5      11.7
read instvar          9.2      11.3
instvar with slots    5.8       8.2
namedtuple           11.1      14.2
  ...

Since the effects are seen in almost every category, I suspect a build problem, a change to the eval loop, or a failure to inline some of the shared functions that used to be macros.
msg392936 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-05-04 18:27
Here's the screen shot for Intel silicon running the python dot org macOS build on Big Sur 11.3.1:

https://www.dropbox.com/s/7lmf74osvq5seg2/Screen%20Shot%202021-05-03%20at%208.41.52%20PM.png?dl=0

Here are the results on M1 Apple silicon with Big Sure 11.2.3:

https://www.dropbox.com/s/u2w4tmv1alzybrt/Screen%20Shot%202021-05-03%20at%209.24.03%20PM.png?dl=0
msg392955 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2021-05-04 21:08
Thanks for the report, Raymond, and for taking the time to check performance.

I am able to reliably reproduce the slowdowns in the benchmark between the 3.10.0a7 and 3.10.0b1 versions provided by the macOS installers on python.org. A similar slowdown is seen on both Intel and Apple Silicon Macs running macOS 11.3.1. Doing a quick build from source, I see no similar large differences between 3.10.0a7 and 3.10.0b1 on either macOS (with XC 12.5 Clang 12.0.5) or on a Debian Linux VM (with GCC 10.2.1). So it does seem to a difference in how the two macOS installers were built. There were some changes in the build env and process between a7 and b1. I will investigate further and report back here. But, the good news is that, AFAICT, there is no reason to suspect a general Python performance regression on this benchmark between a7 and b1.
msg392959 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-05-04 21:37
> There is no reason to suspect a general Python performance regression
> on this benchmark between a7 and b1.

Pablo just ran the same benchmarks on Arch Linux and did not observe the degradation, so this does seem specific to the macOS build.
msg395890 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-06-15 18:15
The problem is still present in Python 3.10b2.
msg398850 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2021-08-03 20:24
Summary: With the 3.10.0rc1 release, this performance regression in the var_access_benchmark starting with the 3.10.0b1 binaries in the python.org macOS universal2 installer is now resolved. With the rc1 release, the performance of this micro benchmark has actually improved over alpha7 in many cases, most notably on Intel Macs.  We have also taken some steps to reduce the chances of significant performance regressions in future releases of the macOS binaries going undetected prior to release.

Details: All var_access_benchmark results (Appendix A and B) are from running macOS Big Sur 11.5.1 (the current release at the moment). The rc1 binaries were also built on 11.5.1 using the current Apple (Xcode) Command Line Tools 12.5.1. In general, we build the python.org macOS installers on the most current macOS and Command Line Tools that have been released by Apple prior to that Python release. The a7 and b2 released universal2 binaries were thus made on then current versions of macOS Big Sur and the Command Line Tools, each different from rc1.

To put these results in context, let me once again note that the primary goal of the python.org macOS installers for many years has been to provide a convenient way to run Python on macOS on as many different Mac systems as possible with one download. For 3.10, the universal2 installer variant we provide is designed to run natively on all Macs that can run any version of macOS from 10.9 through the current macOS 11 Big Sur (and soon to include macOS 12 Monterey), both Intel and Apple Silicon (M1) Macs.  To be able to run on such a wide variety of systems obviously requires some compromises. Thus providing optimum performance in every situation has *never* been a goal for these installers.  That doesn't mean we should totally ignore performance issues and I am grateful to Raymond for bringing this issue forward.  But, and not to belabor the point: for those situations where optimum performance is important, there is no substitute to using a Python built and optimized explicitly for that environment; in other words, don't look to the python.org binaries for those cases.

As an example, with 3.10.0b1, we introduced the first python.org macOS builds that use newer compile- and link-time optimizations (--enable-optimizations and --with-lto). There were some kinks with that that have been subsequently ironed out. But the performance improvements aren't uniform across all systems. It appears that Intel Macs see much more of an improvement than Apple Silicon Macs do. There are probably a couple of reasons for that: for one, the longer experience with the tool chain for Intel archs, but, perhaps more importantly, we currently build universal2 binaries on Intel-based Macs which means performance-based optimizations by the tool chain are based on the performance on an Intel arch which may not be the same as performance on an Apple Silicon (arm64) CPU. That's a topic for future investigation but it is an example of the potential pitfalls when looking at performance.

Another example is that while there are some significant differences in the var_access_benchmark results, which targets specific micro-operations in the Python virtual machine, there is a different story looking at the larger-scale "realworld" benchmarks in the pyperformance package.  When I first started looking at this issue, I ran pyperformance and var_access_benchmark and found that, in general, there were *significant* performance improvements in most pyperformance benchmarks between 3.10.0a7 and 3.10.0b2 even though the var_access_benchmark showed performance regressions.  For 3.10.0rc1, the pyperformance results have mostly improved even more.  I have with some trepidation included some pyperformance results in Appendix C (3.10.0a7 vs 3.10.0b2) and Appendix D (3.10.0a7 vs 3.10.0rc1).  Note that these results were run on a different, faster Intel Mac than the var_access_benchmark results and were run under different updates of macOS 11 so they should be viewed cautiously; as always with performance issues, your mileage may vary.

So by now, one might be curious as to how the performance regression was fixed in rc1. The answer at the moment is: I'm not totally sure! There are a *lot* of moving parts in making a Python release and the binaries that we provide for macOS. While I do try to control the build environments as much as possible (for example, by using dedicated virtual machines for builds) and be conservative about making changes to the build process and environments, especially later in a development/release cycle, as noted above I normally try to keep up with the most recent Apple updates to a given macOS version and build tools to ensure everyone is getting the benefit of the latest security and performance fixes. There have been Apple updates along the way between a7, b2, and rc1. So I can't rule those out and at the moment seem to me to be the most likely. I have put quite a bit of effort into trying to reproduce these results by rebuilding the previous Python pre-releases on newer macOS updates and vice versa but with inconclusive results. I am also handwaving over the special requirements to produce macOS binaries that can be widely-distributed and easily-installed, such as those needed to meet Apple's Gatekeeper requirements (code-signing, hardened runtime, notarization, etc) that have become increasingly stringent (with good reason in an effort to increase overall system security); these can have a performance impact as well.

There certainly might be other causes (including operator error) but I don't think there is much to be gained by further pursuing these lines. Instead, I have added some additional steps to our macOS installer manufacturing process to now run var_access_benchmark as part of the installer testing we do on multiple systems for every release and also to run the lengthier pyperformance benchmarks at key points in a release cycle. Thanks to this issue, we now have some baseline measurements and, with some due diligence, we should be able to catch future significant regressions earlier in a release, unlike up to now where we have basically ignored performance concerns. That's a good thing, I think!

-----------------------------------------------------------------
Appendix A:
===========

python.org macOS universal2: 3.10.0a7 vs 3.10.0b2 vs 3.10.0rc1
Intel Mac Mini (3Ghz 6-Core Intel Core i5)
macOS Big Sur 11.5.1
                                  alpha7        beta2         rc1
                                  ======       ======      ======  
Variable and attribute read access:
read_local                        3.6 ns       4.9 ns      3.3 ns   
read_nonlocal                     3.8 ns       4.5 ns      3.5 ns   
read_global                       5.6 ns       6.0 ns      5.2 ns   
read_builtin                      5.6 ns       6.1 ns      5.3 ns   
read_classvar_from_class         16.1 ns      16.6 ns     14.3 ns   
read_classvar_from_instance      14.9 ns      15.6 ns     13.5 ns   
read_instancevar                 13.6 ns      12.3 ns     10.7 ns   
read_instancevar_slots            7.6 ns       8.9 ns      8.2 ns   
read_namedtuple                  18.0 ns      17.4 ns     15.8 ns   
read_boundmethod                 38.0 ns      33.5 ns     34.0 ns   


Variable and attribute write access:
write_local                       4.3 ns       4.6 ns      3.8 ns   
write_nonlocal                    4.1 ns       4.8 ns      3.9 ns   
write_global                     13.9 ns      13.5 ns     13.2 ns   
write_classvar                   35.4 ns      33.4 ns     34.6 ns   
write_instancevar                34.5 ns      30.3 ns     27.3 ns   
write_instancevar_slots          22.7 ns      23.7 ns     20.8 ns   

Data structure read access:
read_list                        17.6 ns      20.4 ns     16.8 ns   
read_deque                       19.1 ns      19.8 ns     17.1 ns   
read_dict                        19.5 ns      17.5 ns     15.7 ns   
read_strdict                     16.6 ns      16.8 ns     14.8 ns   

Data structure write access:
write_list                       17.8 ns      20.1 ns     18.2 ns   
write_deque                      21.5 ns      22.0 ns     20.7 ns   
write_dict                       24.0 ns      21.5 ns     19.2 ns   
write_strdict                    21.8 ns      20.5 ns     18.9 ns   

Stack (or queue) operations:
list_append_pop                  46.2 ns      42.4 ns     39.3 ns   
deque_append_pop                 40.3 ns      37.8 ns     31.9 ns   
deque_append_popleft             40.9 ns      37.8 ns     32.2 ns   

Timing loop overhead:
loop_overhead                     0.3 ns       0.3 ns      0.2 ns
    
-----------------------------------------------------------------
Appendix B:
===========

python.org macOS universal2: 3.10.0a7 vs 3.10.0b2 vs 3.10.0rc1
Apple Silicon MacBook Air (Apple M1 2020)
macOS Big Sur 11.5.1
                                  alpha7        beta2         rc1
                                  ======       ======      ======  
Variable and attribute read access:
read_local                        4.1 ns       4.9 ns      4.1 ns       
read_nonlocal                     4.1 ns       4.5 ns      4.1 ns       
read_global                       4.2 ns       5.2 ns      4.1 ns       
read_builtin                      4.2 ns       4.9 ns      4.1 ns       
read_classvar_from_class         10.2 ns      12.6 ns     10.7 ns       
read_classvar_from_instance       9.5 ns      12.0 ns      9.5 ns       
read_instancevar                  9.3 ns      11.3 ns      9.5 ns       
read_instancevar_slots            5.7 ns       8.7 ns      6.6 ns       
read_namedtuple                  11.4 ns      14.2 ns     11.8 ns       
read_boundmethod                 23.9 ns      23.1 ns     21.3 ns       


Variable and attribute write access:
write_local                       4.1 ns       5.2 ns      4.1 ns       
write_nonlocal                    4.1 ns       5.1 ns      4.1 ns       
write_global                      9.5 ns      10.1 ns      8.7 ns       
write_classvar                   22.1 ns      25.5 ns     25.7 ns       
write_instancevar                21.5 ns      21.2 ns     18.6 ns       
write_instancevar_slots          15.9 ns      17.6 ns     15.3 ns       

Data structure read access:
read_list                        10.3 ns      14.1 ns     11.0 ns       
read_deque                       11.1 ns      15.6 ns     11.9 ns       
read_dict                        12.1 ns      13.4 ns     11.0 ns       
read_strdict                     10.8 ns      13.4 ns     10.9 ns       

Data structure write access:
write_list                       12.5 ns      16.3 ns     12.8 ns       
write_deque                      12.7 ns      16.8 ns     13.5 ns       
write_dict                       15.4 ns      16.6 ns     13.9 ns       
write_strdict                    14.8 ns      17.1 ns     13.8 ns       

Stack (or queue) operations:
list_append_pop                  27.5 ns      29.0 ns     24.6 ns   
deque_append_pop                 25.3 ns      26.0 ns     21.6 ns   
deque_append_popleft             25.2 ns      26.4 ns     21.6 ns   

Timing loop overhead:
loop_overhead                     0.2 ns       0.3 ns      0.2 ns   

-----------------------------------------------------------------
Appendix C:
===========

pyperformance (python.org macOS universal2: 3.10.0a7 vs 3.10.0b2)
Intel iMac (3.6 GHz 8-Core Intel Core i9)

py310a7_11.json
===============

Performance version: 1.0.2
Report on macOS-11.3.1-x86_64-i386-64bit
Number of logical CPUs: 16
Start date: 2021-05-25 23:21:49.171748
End date: 2021-05-25 23:41:48.281555

py310b2_11.json
===============

Performance version: 1.0.2
Report on macOS-11.4-x86_64-i386-64bit
Number of logical CPUs: 16
Start date: 2021-05-31 11:11:51.120522
End date: 2021-05-31 11:31:17.010744

### 2to3 ###
Mean +- std dev: 404 ms +- 46 ms -> 334 ms +- 28 ms: 1.21x faster
Significant (t=9.99)

### chameleon ###
Mean +- std dev: 10.8 ms +- 0.2 ms -> 8.8 ms +- 0.2 ms: 1.23x faster
Significant (t=62.03)

### chaos ###
Mean +- std dev: 130 ms +- 1 ms -> 103 ms +- 2 ms: 1.27x faster
Significant (t=95.95)

### crypto_pyaes ###
Mean +- std dev: 136 ms +- 1 ms -> 112 ms +- 2 ms: 1.21x faster
Significant (t=74.54)

### deltablue ###
Mean +- std dev: 9.26 ms +- 0.12 ms -> 7.03 ms +- 0.15 ms: 1.32x faster
Significant (t=87.77)

### django_template ###
Mean +- std dev: 60.1 ms +- 1.1 ms -> 45.6 ms +- 1.0 ms: 1.32x faster
Significant (t=75.49)

### dulwich_log ###
Mean +- std dev: 109 ms +- 1 ms -> 93 ms +- 2 ms: 1.17x faster
Significant (t=57.00)

### fannkuch ###
Mean +- std dev: 524 ms +- 3 ms -> 476 ms +- 5 ms: 1.10x faster
Significant (t=63.67)

### float ###
Mean +- std dev: 119 ms +- 2 ms -> 96 ms +- 2 ms: 1.24x faster
Significant (t=59.37)

### go ###
Mean +- std dev: 260 ms +- 2 ms -> 217 ms +- 4 ms: 1.20x faster
Significant (t=77.33)

### hexiom ###
Mean +- std dev: 11.0 ms +- 0.1 ms -> 9.0 ms +- 0.1 ms: 1.23x faster
Significant (t=94.06)

### json_dumps ###
Mean +- std dev: 15.1 ms +- 0.1 ms -> 12.1 ms +- 0.3 ms: 1.25x faster
Significant (t=74.07)

### json_loads ###
Mean +- std dev: 30.9 us +- 0.3 us -> 23.8 us +- 0.4 us: 1.30x faster
Significant (t=116.60)

### logging_format ###
Mean +- std dev: 12.1 us +- 0.2 us -> 9.4 us +- 0.2 us: 1.28x faster
Significant (t=72.96)

### logging_silent ###
Mean +- std dev: 218 ns +- 6 ns -> 159 ns +- 8 ns: 1.36x faster
Significant (t=46.17)

### logging_simple ###
Mean +- std dev: 11.0 us +- 0.2 us -> 8.4 us +- 0.2 us: 1.31x faster
Significant (t=77.65)

### mako ###
Mean +- std dev: 18.9 ms +- 0.4 ms -> 14.7 ms +- 0.2 ms: 1.29x faster
Significant (t=78.98)

### meteor_contest ###
Mean +- std dev: 108 ms +- 1 ms -> 96 ms +- 2 ms: 1.13x faster
Significant (t=48.25)

### nbody ###
Mean +- std dev: 159 ms +- 2 ms -> 139 ms +- 3 ms: 1.15x faster
Significant (t=41.09)

### nqueens ###
Mean +- std dev: 116 ms +- 1 ms -> 98 ms +- 2 ms: 1.18x faster
Significant (t=72.36)

### pathlib ###
Mean +- std dev: 51.5 ms +- 0.8 ms -> 44.1 ms +- 0.9 ms: 1.17x faster
Significant (t=48.46)

### pickle ###
Mean +- std dev: 12.1 us +- 0.1 us -> 9.3 us +- 0.2 us: 1.30x faster
Significant (t=94.04)

### pickle_dict ###
Mean +- std dev: 28.6 us +- 0.1 us -> 21.7 us +- 0.3 us: 1.32x faster
Significant (t=167.94)

### pickle_list ###
Mean +- std dev: 4.55 us +- 0.04 us -> 3.59 us +- 0.07 us: 1.27x faster
Significant (t=94.87)

### pickle_pure_python ###
Mean +- std dev: 547 us +- 4 us -> 424 us +- 11 us: 1.29x faster
Significant (t=78.84)

### pidigits ###
Mean +- std dev: 183 ms +- 1 ms -> 172 ms +- 2 ms: 1.07x faster
Significant (t=38.00)

### pyflate ###
Mean +- std dev: 782 ms +- 8 ms -> 634 ms +- 9 ms: 1.23x faster
Significant (t=100.31)

### python_startup ###
Mean +- std dev: 14.4 ms +- 0.1 ms -> 13.5 ms +- 0.4 ms: 1.07x faster
Significant (t=32.46)

### python_startup_no_site ###
Mean +- std dev: 10.7 ms +- 0.1 ms -> 10.1 ms +- 0.1 ms: 1.06x faster
Significant (t=58.48)

### raytrace ###
Mean +- std dev: 616 ms +- 4 ms -> 452 ms +- 6 ms: 1.36x faster
Significant (t=174.57)

### regex_compile ###
Mean +- std dev: 197 ms +- 2 ms -> 165 ms +- 3 ms: 1.19x faster
Significant (t=67.90)

### regex_dna ###
Mean +- std dev: 174 ms +- 1 ms -> 170 ms +- 3 ms: 1.02x faster
Not significant

### regex_effbot ###
Mean +- std dev: 3.28 ms +- 0.03 ms -> 3.07 ms +- 0.06 ms: 1.07x faster
Significant (t=26.18)

### regex_v8 ###
Mean +- std dev: 26.0 ms +- 0.2 ms -> 23.1 ms +- 0.2 ms: 1.13x faster
Significant (t=78.18)

### richards ###
Mean +- std dev: 91.2 ms +- 0.8 ms -> 67.2 ms +- 2.2 ms: 1.36x faster
Significant (t=80.61)

### scimark_fft ###
Mean +- std dev: 498 ms +- 4 ms -> 373 ms +- 6 ms: 1.34x faster
Significant (t=140.60)

### scimark_lu ###
Mean +- std dev: 194 ms +- 2 ms -> 155 ms +- 2 ms: 1.25x faster
Significant (t=88.27)

### scimark_monte_carlo ###
Mean +- std dev: 120 ms +- 1 ms -> 94 ms +- 2 ms: 1.28x faster
Significant (t=79.90)

### scimark_sor ###
Mean +- std dev: 231 ms +- 1 ms -> 179 ms +- 4 ms: 1.29x faster
Significant (t=88.38)

### scimark_sparse_mat_mult ###
Mean +- std dev: 7.31 ms +- 0.19 ms -> 5.11 ms +- 0.10 ms: 1.43x faster
Significant (t=79.51)

### spectral_norm ###
Mean +- std dev: 186 ms +- 3 ms -> 139 ms +- 3 ms: 1.34x faster
Significant (t=93.33)

### sqlalchemy_declarative ###
Mean +- std dev: 165 ms +- 3 ms -> 144 ms +- 3 ms: 1.15x faster
Significant (t=37.78)

### sqlalchemy_imperative ###
Mean +- std dev: 26.2 ms +- 0.8 ms -> 22.9 ms +- 0.6 ms: 1.14x faster
Significant (t=24.65)

### sqlite_synth ###
Mean +- std dev: 2.98 us +- 0.04 us -> 2.52 us +- 0.06 us: 1.19x faster
Significant (t=47.11)

### sympy_expand ###
Mean +- std dev: 615 ms +- 6 ms -> 521 ms +- 10 ms: 1.18x faster
Significant (t=63.67)

### sympy_integrate ###
Mean +- std dev: 25.5 ms +- 0.3 ms -> 23.1 ms +- 0.4 ms: 1.10x faster
Significant (t=38.96)

### sympy_str ###
Mean +- std dev: 366 ms +- 4 ms -> 316 ms +- 4 ms: 1.16x faster
Significant (t=61.03)

### sympy_sum ###
Mean +- std dev: 204 ms +- 3 ms -> 186 ms +- 8 ms: 1.10x faster
Significant (t=16.06)

### telco ###
Mean +- std dev: 7.40 ms +- 0.15 ms -> 5.91 ms +- 0.13 ms: 1.25x faster
Significant (t=57.40)

### tornado_http ###
Mean +- std dev: 167 ms +- 4 ms -> 157 ms +- 5 ms: 1.07x faster
Significant (t=13.39)

### unpack_sequence ###
Mean +- std dev: 47.3 ns +- 2.7 ns -> 61.4 ns +- 0.5 ns: 1.30x slower
Significant (t=-40.46)

### unpickle ###
Mean +- std dev: 17.4 us +- 0.2 us -> 13.7 us +- 0.2 us: 1.27x faster
Significant (t=91.92)

### unpickle_list ###
Mean +- std dev: 4.73 us +- 0.07 us -> 3.93 us +- 0.10 us: 1.21x faster
Significant (t=49.78)

### unpickle_pure_python ###
Mean +- std dev: 365 us +- 7 us -> 311 us +- 4 us: 1.18x faster
Significant (t=51.04)

### xml_etree_generate ###
Mean +- std dev: 103 ms +- 2 ms -> 89 ms +- 1 ms: 1.15x faster
Significant (t=48.39)

### xml_etree_iterparse ###
Mean +- std dev: 115 ms +- 3 ms -> 101 ms +- 2 ms: 1.14x faster
Significant (t=29.21)

### xml_etree_parse ###
Mean +- std dev: 158 ms +- 3 ms -> 145 ms +- 3 ms: 1.09x faster
Significant (t=24.41)

### xml_etree_process ###
Mean +- std dev: 85.5 ms +- 1.8 ms -> 70.4 ms +- 1.1 ms: 1.22x faster
Significant (t=55.52)

-----------------------------------------------------------------
Appendix D:
===========

pyperformance (python.org macOS universal2: 3.10.0a7 vs 3.10.0rc1)
Intel iMac (3.6 GHz 8-Core Intel Core i9)

py310a7_11.json
===============

Performance version: 1.0.2
Report on macOS-11.3.1-x86_64-i386-64bit
Number of logical CPUs: 16
Start date: 2021-05-25 23:21:49.171748
End date: 2021-05-25 23:41:48.281555

py310rc1.json
=============

Performance version: 1.0.2
Report on macOS-11.5.1-x86_64-i386-64bit
Number of logical CPUs: 16
Start date: 2021-08-02 18:43:01.388887
End date: 2021-08-02 19:01:56.172079

### 2to3 ###
Mean +- std dev: 404 ms +- 46 ms -> 314 ms +- 24 ms: 1.29x faster
Significant (t=13.40)

### chameleon ###
Mean +- std dev: 10.8 ms +- 0.2 ms -> 8.5 ms +- 0.1 ms: 1.28x faster
Significant (t=78.99)

### chaos ###
Mean +- std dev: 130 ms +- 1 ms -> 95 ms +- 1 ms: 1.37x faster
Significant (t=143.64)

### crypto_pyaes ###
Mean +- std dev: 136 ms +- 1 ms -> 109 ms +- 1 ms: 1.24x faster
Significant (t=136.81)

### deltablue ###
Mean +- std dev: 9.26 ms +- 0.12 ms -> 6.76 ms +- 0.13 ms: 1.37x faster
Significant (t=108.53)

### django_template ###
Mean +- std dev: 60.1 ms +- 1.1 ms -> 44.3 ms +- 0.9 ms: 1.36x faster
Significant (t=84.30)

### dulwich_log ###
Mean +- std dev: 109 ms +- 1 ms -> 91 ms +- 2 ms: 1.19x faster
Significant (t=65.99)

### fannkuch ###
Mean +- std dev: 524 ms +- 3 ms -> 444 ms +- 5 ms: 1.18x faster
Significant (t=114.93)

### float ###
Mean +- std dev: 119 ms +- 2 ms -> 93 ms +- 2 ms: 1.28x faster
Significant (t=62.91)

### go ###
Mean +- std dev: 260 ms +- 2 ms -> 204 ms +- 3 ms: 1.27x faster
Significant (t=119.49)

### hexiom ###
Mean +- std dev: 11.0 ms +- 0.1 ms -> 8.5 ms +- 0.2 ms: 1.29x faster
Significant (t=80.26)

### json_dumps ###
Mean +- std dev: 15.1 ms +- 0.1 ms -> 12.0 ms +- 0.2 ms: 1.27x faster
Significant (t=91.36)

### json_loads ###
Mean +- std dev: 30.9 us +- 0.3 us -> 23.2 us +- 0.4 us: 1.34x faster
Significant (t=132.63)

### logging_format ###
Mean +- std dev: 12.1 us +- 0.2 us -> 9.2 us +- 0.2 us: 1.31x faster
Significant (t=84.91)

### logging_silent ###
Mean +- std dev: 218 ns +- 6 ns -> 151 ns +- 4 ns: 1.44x faster
Significant (t=74.47)

### logging_simple ###
Mean +- std dev: 11.0 us +- 0.2 us -> 8.3 us +- 0.2 us: 1.33x faster
Significant (t=79.84)

### mako ###
Mean +- std dev: 18.9 ms +- 0.4 ms -> 14.1 ms +- 0.3 ms: 1.34x faster
Significant (t=81.37)

### meteor_contest ###
Mean +- std dev: 108 ms +- 1 ms -> 92 ms +- 1 ms: 1.18x faster
Significant (t=72.28)

### nbody ###
Mean +- std dev: 159 ms +- 2 ms -> 120 ms +- 3 ms: 1.32x faster
Significant (t=96.54)

### nqueens ###
Mean +- std dev: 116 ms +- 1 ms -> 94 ms +- 1 ms: 1.23x faster
Significant (t=106.68)

### pathlib ###
Mean +- std dev: 51.5 ms +- 0.8 ms -> 46.5 ms +- 0.8 ms: 1.11x faster
Significant (t=33.82)

### pickle ###
Mean +- std dev: 12.1 us +- 0.1 us -> 9.5 us +- 0.2 us: 1.28x faster
Significant (t=110.62)

### pickle_dict ###
Mean +- std dev: 28.6 us +- 0.1 us -> 21.5 us +- 0.3 us: 1.33x faster
Significant (t=176.35)

### pickle_list ###
Mean +- std dev: 4.55 us +- 0.04 us -> 3.62 us +- 0.08 us: 1.26x faster
Significant (t=82.58)

### pickle_pure_python ###
Mean +- std dev: 547 us +- 4 us -> 402 us +- 8 us: 1.36x faster
Significant (t=130.95)

### pidigits ###
Mean +- std dev: 183 ms +- 1 ms -> 172 ms +- 2 ms: 1.07x faster
Significant (t=35.90)

### pyflate ###
Mean +- std dev: 782 ms +- 8 ms -> 598 ms +- 8 ms: 1.31x faster
Significant (t=127.68)

### python_startup ###
Mean +- std dev: 14.4 ms +- 0.1 ms -> 13.2 ms +- 0.3 ms: 1.09x faster
Significant (t=46.93)

### python_startup_no_site ###
Mean +- std dev: 10.7 ms +- 0.1 ms -> 10.2 ms +- 0.2 ms: 1.05x faster
Significant (t=42.58)

### raytrace ###
Mean +- std dev: 616 ms +- 4 ms -> 434 ms +- 7 ms: 1.42x faster
Significant (t=172.50)

### regex_compile ###
Mean +- std dev: 197 ms +- 2 ms -> 159 ms +- 2 ms: 1.24x faster
Significant (t=106.63)

### regex_dna ###
Mean +- std dev: 174 ms +- 1 ms -> 172 ms +- 3 ms: 1.01x faster
Not significant

### regex_effbot ###
Mean +- std dev: 3.28 ms +- 0.03 ms -> 3.14 ms +- 0.05 ms: 1.05x faster
Significant (t=20.22)

### regex_v8 ###
Mean +- std dev: 26.0 ms +- 0.2 ms -> 22.8 ms +- 0.2 ms: 1.14x faster
Significant (t=78.67)

### richards ###
Mean +- std dev: 91.2 ms +- 0.8 ms -> 64.5 ms +- 1.6 ms: 1.41x faster
Significant (t=119.12)

### scimark_fft ###
Mean +- std dev: 498 ms +- 4 ms -> 347 ms +- 5 ms: 1.44x faster
Significant (t=185.69)

### scimark_lu ###
Mean +- std dev: 194 ms +- 2 ms -> 148 ms +- 3 ms: 1.31x faster
Significant (t=87.01)

### scimark_monte_carlo ###
Mean +- std dev: 120 ms +- 1 ms -> 89 ms +- 2 ms: 1.35x faster
Significant (t=99.22)

### scimark_sor ###
Mean +- std dev: 231 ms +- 1 ms -> 166 ms +- 3 ms: 1.39x faster
Significant (t=169.87)

### scimark_sparse_mat_mult ###
Mean +- std dev: 7.31 ms +- 0.19 ms -> 4.82 ms +- 0.13 ms: 1.52x faster
Significant (t=83.76)

### spectral_norm ###
Mean +- std dev: 186 ms +- 3 ms -> 134 ms +- 3 ms: 1.39x faster
Significant (t=104.61)

### sqlalchemy_declarative ###
Mean +- std dev: 165 ms +- 3 ms -> 143 ms +- 4 ms: 1.16x faster
Significant (t=33.78)

### sqlalchemy_imperative ###
Mean +- std dev: 26.2 ms +- 0.8 ms -> 21.8 ms +- 0.6 ms: 1.20x faster
Significant (t=32.12)

### sqlite_synth ###
Mean +- std dev: 2.98 us +- 0.04 us -> 2.43 us +- 0.04 us: 1.23x faster
Significant (t=68.81)

### sympy_expand ###
Mean +- std dev: 615 ms +- 6 ms -> 486 ms +- 8 ms: 1.27x faster
Significant (t=106.54)

### sympy_integrate ###
Mean +- std dev: 25.5 ms +- 0.3 ms -> 21.2 ms +- 0.4 ms: 1.20x faster
Significant (t=73.67)

### sympy_str ###
Mean +- std dev: 366 ms +- 4 ms -> 296 ms +- 4 ms: 1.24x faster
Significant (t=92.10)

### sympy_sum ###
Mean +- std dev: 204 ms +- 3 ms -> 172 ms +- 3 ms: 1.19x faster
Significant (t=57.73)

### telco ###
Mean +- std dev: 7.40 ms +- 0.15 ms -> 5.67 ms +- 0.12 ms: 1.31x faster
Significant (t=70.86)

### tornado_http ###
Mean +- std dev: 167 ms +- 4 ms -> 154 ms +- 5 ms: 1.09x faster
Significant (t=18.37)

### unpack_sequence ###
Mean +- std dev: 47.3 ns +- 2.7 ns -> 44.4 ns +- 0.7 ns: 1.07x faster
Significant (t=8.35)

### unpickle ###
Mean +- std dev: 17.4 us +- 0.2 us -> 13.2 us +- 0.3 us: 1.32x faster
Significant (t=81.72)

### unpickle_list ###
Mean +- std dev: 4.73 us +- 0.07 us -> 3.87 us +- 0.05 us: 1.22x faster
Significant (t=76.77)

### unpickle_pure_python ###
Mean +- std dev: 365 us +- 7 us -> 283 us +- 3 us: 1.29x faster
Significant (t=78.83)

### xml_etree_generate ###
Mean +- std dev: 103 ms +- 2 ms -> 83 ms +- 2 ms: 1.23x faster
Significant (t=57.90)

### xml_etree_iterparse ###
Mean +- std dev: 115 ms +- 3 ms -> 95 ms +- 1 ms: 1.21x faster
Significant (t=40.39)

### xml_etree_parse ###
Mean +- std dev: 158 ms +- 3 ms -> 142 ms +- 2 ms: 1.11x faster
Significant (t=34.23)

### xml_etree_process ###
Mean +- std dev: 85.5 ms +- 1.8 ms -> 66.1 ms +- 1.3 ms: 1.29x faster
Significant (t=68.72)

-----------------------------------------------------------------
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88203
2021-08-03 20:24:17ned.deilysetstatus: open -> closed
resolution: fixed
messages: + msg398850

stage: resolved
2021-07-26 14:55:49andrenvksetnosy: + andrenvk
2021-06-15 18:15:23rhettingersetmessages: + msg395890
title: Broad performance regression from 3.10a7 to 3.10b1 with python.org macOS binaries -> Broad performance regression from 3.10a7 to 3.10b2 with python.org macOS binaries
2021-05-19 22:22:15brandtbuchersetnosy: + brandtbucher
2021-05-10 21:34:14vstinnersetnosy: - vstinner
2021-05-04 21:37:47rhettingersetnosy: + ronaldoussoren
messages: + msg392959
2021-05-04 21:08:07ned.deilysetpriority: normal -> critical
assignee: ned.deily
messages: + msg392955

title: Broad performance regression from 3.10a7 to 3.10b1 -> Broad performance regression from 3.10a7 to 3.10b1 with python.org macOS binaries
2021-05-04 19:19:16JelleZijlstrasetnosy: + JelleZijlstra
2021-05-04 18:27:15rhettingersetnosy: + vstinner, ned.deily, pablogsal
messages: + msg392936
2021-05-04 18:19:14rhettingercreate