Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialize BINARY_MULTIPLY #89530

Closed
sweeneyde opened this issue Oct 5, 2021 · 5 comments
Closed

Specialize BINARY_MULTIPLY #89530

sweeneyde opened this issue Oct 5, 2021 · 5 comments
Labels
3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@sweeneyde
Copy link
Member

BPO 45367
Nosy @markshannon, @sweeneyde, @Fidget-Spinner
PRs
  • bpo-45367: Specialize BINARY_MULTIPLY #28727
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-10-17.22:39:17.971>
    created_at = <Date 2021-10-05.03:44:12.910>
    labels = ['interpreter-core', '3.11', 'performance']
    title = 'Specialize BINARY_MULTIPLY'
    updated_at = <Date 2021-10-17.22:39:17.971>
    user = 'https://github.com/sweeneyde'

    bugs.python.org fields:

    activity = <Date 2021-10-17.22:39:17.971>
    actor = 'Dennis Sweeney'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-10-17.22:39:17.971>
    closer = 'Dennis Sweeney'
    components = ['Interpreter Core']
    creation = <Date 2021-10-05.03:44:12.910>
    creator = 'Dennis Sweeney'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45367
    keywords = ['patch']
    message_count = 5.0
    messages = ['403189', '403240', '403242', '403290', '403905']
    nosy_count = 3.0
    nosy_names = ['Mark.Shannon', 'Dennis Sweeney', 'kj']
    pr_nums = ['28727']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue45367'
    versions = ['Python 3.11']

    @sweeneyde
    Copy link
    Member Author

    I'm having trouble setting up a rigorous benchmark (Windows doesn't want to install greenlet for pyperformance), but just running a couple of individual files, I got this:

    Mean +- std dev: [nbody_main] 208 ms +- 2 ms -> [nbody_specialized] 180 ms +- 2 ms: 1.16x faster
    Benchmark hidden because not significant (1): pidigits
    Mean +- std dev: [chaos_main] 127 ms +- 2 ms -> [chaos_specialized] 120 ms +- 1 ms: 1.06x faster
    Mean +- std dev: [spectral_norm_main] 190 ms +- 10 ms -> [spectral_norm_specialized] 175 ms +- 1 ms: 1.09x faster
    Mean +- std dev: [raytrace_main] 588 ms +- 48 ms -> [raytrace_specialized] 540 ms +- 4 ms: 1.09x faster

    Hopefully those are accurate.

    @sweeneyde sweeneyde added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Oct 5, 2021
    @Fidget-Spinner
    Copy link
    Member

    (Windows doesn't want to install greenlet for pyperformance)

    I had the *exact* same issues, I eventually found a workaround for it after many hours spent guessing.

    Initially, setuptools complained that I needed MSVC++ 14.0 or later (even after I had the latest one installed). I found that for some strange reason, *only* 14.0 worked, 14.2x etc. don't. After installing MSVC 14.0, there was then some strange complaint about missing some .exe/.dll. Searching that entire error message led me to a result on StackOverflow advising copying said files from the Windows SDK in Visual Studio over to the MSVC 14.0 folder. This finally allowed greenlet to compile. I've since lost the exact SO links, but I hope this leads you somewhere.

    Anyways, I don't recommend benchmarking on Windows for stable results (trust me, I've tried ;-). pyperf system tune doesn't work on Windows. This leads to very inconsistent results unless you manually disable turbo boost, set core affinities, etc. Also, PGO for _PyEvalFrameDefaultEx might be broken on Windows on the main branch (see bpo-45116).

    Eventually I gave up and just used Linux for stable benchmarking. pyperformance compile_all also works properly there, which allows you to automate your benchmarks https://pyperformance.readthedocs.io/usage.html#compile-python-to-run-benchmarks

    PS. are your numbers with PGO and LTO? If so, they're spectacular!

    @sweeneyde
    Copy link
    Member Author

    Hm the above was not PGO. I tried again with PGO and it is not so good:

    Mean +- std dev: [nbody_main_pgo] 177 ms +- 4 ms -> [nbody_specialized_pgo] 190 ms +- 2 ms: 1.07x slower
    Mean +- std dev: [pidigits_main_pgo] 208 ms +- 1 ms -> [pidigits_specialized_pgo] 210 ms +- 2 ms: 1.01x slower
    Mean +- std dev: [chaos_main_pgo] 106 ms +- 1 ms -> [chaos_specialized_pgo] 110 ms +- 1 ms: 1.04x slower
    Mean +- std dev: [spectral_norm_main_pgo] 169 ms +- 7 ms -> [spectral_norm_specialized_pgo] 167 ms +- 1 ms: 1.02x faster
    Benchmark hidden because not significant (1): raytrace

    @markshannon
    Copy link
    Member

    If some misses are caused by mixed int/float operands, it might be worth investigating whether these occur in loops.

    Most JIT compilers perform some sort of loop peeling to counter this form of type instability.

    E.g.
    x = 0
    for ...
    x += some_float()

    x is an int for the first iteration, and a float for the others.

    By unpeeling the first iteration, we get type stability in the loop

    x = 0
    #first iteration
    x += some_float()
    for ... #Remaining iterations
        x += some_float()  # x is always a float here.

    @markshannon
    Copy link
    Member

    New changeset 3b3d30e by Dennis Sweeney in branch 'main':
    bpo-45367: Specialize BINARY_MULTIPLY (GH-28727)
    3b3d30e

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants