classification
Title: Specialize BINARY_MULTIPLY
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Dennis Sweeney, Mark.Shannon, kj
Priority: normal Keywords: patch

Created on 2021-10-05 03:44 by Dennis Sweeney, last changed 2021-10-17 22:39 by Dennis Sweeney. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 28727 merged Dennis Sweeney, 2021-10-05 03:45
Messages (5)
msg403189 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python triager) Date: 2021-10-05 03:44
I'm having trouble setting up a rigorous benchmark (Windows doesn't want to install greenlet for pyperformance), but just running a couple of individual files, I got this:

Mean +- std dev: [nbody_main] 208 ms +- 2 ms -> [nbody_specialized] 180 ms +- 2 ms: 1.16x faster
Benchmark hidden because not significant (1): pidigits
Mean +- std dev: [chaos_main] 127 ms +- 2 ms -> [chaos_specialized] 120 ms +- 1 ms: 1.06x faster
Mean +- std dev: [spectral_norm_main] 190 ms +- 10 ms -> [spectral_norm_specialized] 175 ms +- 1 ms: 1.09x faster
Mean +- std dev: [raytrace_main] 588 ms +- 48 ms -> [raytrace_specialized] 540 ms +- 4 ms: 1.09x faster

Hopefully those are accurate.
msg403240 - (view) Author: Ken Jin (kj) * (Python committer) Date: 2021-10-05 13:25
> (Windows doesn't want to install greenlet for pyperformance)

I had the *exact* same issues, I eventually found a workaround for it after many hours spent guessing.

Initially, setuptools complained that I needed MSVC++ 14.0 or later (even after I had the latest one installed). I found that for some strange reason, *only* 14.0 worked, 14.2x etc. don't. After installing MSVC 14.0, there was then some strange complaint about missing some .exe/.dll. Searching that entire error message led me to a result on StackOverflow advising copying said files from the Windows SDK in Visual Studio over to the MSVC 14.0 folder. This finally allowed greenlet to compile. I've since lost the exact SO links, but I hope this leads you somewhere.

Anyways, I don't recommend benchmarking on Windows for stable results (trust me, I've tried ;-). `pyperf system tune` doesn't work on Windows. This leads to very inconsistent results unless you manually disable turbo boost, set core affinities, etc. Also, PGO for _PyEvalFrameDefaultEx might be broken on Windows on the main branch (see issue45116).

Eventually I gave up and just used Linux for stable benchmarking. pyperformance `compile_all` also works properly there, which allows you to automate your benchmarks https://pyperformance.readthedocs.io/usage.html#compile-python-to-run-benchmarks

PS. are your numbers with PGO and LTO? If so, they're spectacular!
msg403242 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python triager) Date: 2021-10-05 14:54
Hm the above was not PGO. I tried again with PGO and it is not so good:

Mean +- std dev: [nbody_main_pgo] 177 ms +- 4 ms -> [nbody_specialized_pgo] 190 ms +- 2 ms: 1.07x slower
Mean +- std dev: [pidigits_main_pgo] 208 ms +- 1 ms -> [pidigits_specialized_pgo] 210 ms +- 2 ms: 1.01x slower
Mean +- std dev: [chaos_main_pgo] 106 ms +- 1 ms -> [chaos_specialized_pgo] 110 ms +- 1 ms: 1.04x slower
Mean +- std dev: [spectral_norm_main_pgo] 169 ms +- 7 ms -> [spectral_norm_specialized_pgo] 167 ms +- 1 ms: 1.02x faster
Benchmark hidden because not significant (1): raytrace
msg403290 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-10-06 09:01
If some misses are caused by mixed int/float operands, it might be worth investigating whether these occur in loops.

Most JIT compilers perform some sort of loop peeling to counter this form of type instability.

E.g.
x = 0
for ...
    x += some_float()

`x` is an int for the first iteration, and a float for the others.


By unpeeling the first iteration, we get type stability in the loop

x = 0
#first iteration
x += some_float()
for ... #Remaining iterations
    x += some_float()  # x is always a float here.
msg403905 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-10-14 14:56
New changeset 3b3d30e8f78271a488965c9cd11136e1aa890757 by Dennis Sweeney in branch 'main':
bpo-45367: Specialize BINARY_MULTIPLY (GH-28727)
https://github.com/python/cpython/commit/3b3d30e8f78271a488965c9cd11136e1aa890757
History
Date User Action Args
2021-10-17 22:39:17Dennis Sweeneysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-10-14 14:56:37Mark.Shannonsetmessages: + msg403905
2021-10-06 09:01:18Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg403290
2021-10-05 14:54:55Dennis Sweeneysetmessages: + msg403242
2021-10-05 13:25:51kjsetnosy: + kj
messages: + msg403240
2021-10-05 03:45:02Dennis Sweeneysetkeywords: + patch
stage: patch review
pull_requests: + pull_request27074
2021-10-05 03:44:12Dennis Sweeneycreate