This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Performance regression of Int object operators. (Python 3.11)
Type: performance Stage: resolved
Components: Versions: Python 3.11
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, gvanrossum, iritkatriel, rhettinger, xuxinhang
Priority: normal Keywords:

Created on 2021-11-23 15:52 by xuxinhang, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (6)
msg406849 - (view) Author: Xinhang Xu (xuxinhang) * Date: 2021-11-23 15:52
Hello.

I'm working on a compute-bound project recently, so I tested several Python versions on my PC (Windows 10 64-bit), about Python's performance on operating Int object. All Python binaries are official distributions.


Testcase #1 (simple xor op)

Source:

    import cProfile as profile
    profile.run('for _ in range(500000000): 5 ^ 6')

The given result:

   C:\Users\surface\Desktop\PythonTest>python-3.9.9-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 24.398 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   24.398   24.398   24.398   24.398 <string>:1(<module>)
        1    0.000    0.000   24.398   24.398 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



   C:\Users\surface\Desktop\PythonTest>python-3.10.0-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 27.941 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   27.941   27.941   27.941   27.941 <string>:1(<module>)
        1    0.000    0.000   27.941   27.941 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



   C:\Users\surface\Desktop\PythonTest>python-3.11.0a2-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 42.209 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   42.209   42.209   42.209   42.209 <string>:1(<module>)
        1    0.000    0.000   42.209   42.209 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



Testcase #2 (simple add op)

Source:

    import cProfile as profile
    profile.run('for _ in range(500000000): 5 + 6')

The given result:

   C:\Users\surface\Desktop\PythonTest>python-3.9.9-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 24.599 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   24.599   24.599   24.599   24.599 <string>:1(<module>)
        1    0.000    0.000   24.599   24.599 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



   C:\Users\surface\Desktop\PythonTest>python-3.10.0-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 27.414 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   27.414   27.414   27.414   27.414 <string>:1(<module>)
        1    0.000    0.000   27.414   27.414 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



   C:\Users\surface\Desktop\PythonTest>python-3.11.0a2-embed-amd64\python C:\Users\surface\python_test.py
         3 function calls in 43.675 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   43.675   43.675   43.675   43.675 <string>:1(<module>)
        1    0.000    0.000   43.675   43.675 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}



As you can see, Python 3.11 costs *much more* time to execute Int object operator. I have also tested the same cases on another Windows PC, the result shows the same.

Is it a common thing? What's the reason for this problem?


Thanks.
msg406861 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-11-23 17:03
Here are timings from stock macOS builds from python.org:

$ python3.10 --version
Python 3.10.0
$ python3.10 -m timeit -r 11 -s 'x=5' 'x^3'      # Xor with variable
5000000 loops, best of 11: 41.4 nsec per loop
$ python3.10 -m timeit -r 11 -s 'x=5' 'x+3'      # Add with variable
10000000 loops, best of 11: 22.7 nsec per loop
$ python3.10 -m timeit -r 11 -s '3 + 5'          # Constant folded
50000000 loops, best of 11: 7.15 nsec per loop

$ python3.11 --version
Python 3.11.0a2
$ python3.11 -m timeit -r 11 -s 'x=5' 'x^3'
20000000 loops, best of 11: 19.2 nsec per loop
$ python3.10 -m timeit -r 11 -s 'x=5' 'x+3'
10000000 loops, best of 11: 22.5 nsec per loop
$ python3.10 -m timeit -r 11 -s '3 + 5'
50000000 loops, best of 11: 7.16 nsec per loop
msg406880 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-11-23 20:05
Raymond, your last two results look like 3.10 rather than 3.11.
msg406881 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-11-23 20:13
I assume there's something wrong with the OP's methodology for profiling. 

(Raymond, in your last example there's a -s argument but nothing to run -- does timeit just use "pass" in that case?)
msg406882 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-11-23 20:16
> I assume there's something wrong with the OP's methodology for profiling.

In a word "cProfile".
All cProfile is doing is measuring the overhead of cProfile.

That the overhead of cProfile has gone up is not surprising. That it has gone up so much, is a bit surprising.

My advice would be use a sampling profiling like Scalene.
msg406883 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2021-11-23 20:27
Reposting with more careful timing runs.  No regression was observed.

$ python3.10 --version
Python 3.10.0
$ python3.10 -m timeit -r 11 -s 'x=5' 'x^3'    # Xor with variable
5000000 loops, best of 11: 41.7 nsec per loop
$ python3.10 -m timeit -r 11 -s 'x=5' 'x+3'    # Add with variable
10000000 loops, best of 11: 22.7 nsec per loop
$ python3.10 -m timeit -r 11 '5+3'             # Constant folded
50000000 loops, best of 11: 7.22 nsec per loop

$ python3.11 --version
Python 3.11.0a2
$ python3.11 -m timeit -r 11 -s 'x=5' 'x^3'    # Xor with variable
20000000 loops, best of 11: 19.4 nsec per loop
$ python3.11 -m timeit -r 11 -s 'x=5' 'x+3'    # Add with variable
20000000 loops, best of 11: 17 nsec per loop
$ python3.11 -m timeit -r 11 '5+3'             # Constant folded
50000000 loops, best of 11: 7.27 nsec per loop
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 90038
2021-11-23 20:56:06gvanrossumsetstatus: open -> closed
resolution: works for me
stage: resolved
2021-11-23 20:27:23rhettingersetmessages: + msg406883
2021-11-23 20:16:29Mark.Shannonsetmessages: + msg406882
2021-11-23 20:13:24gvanrossumsetmessages: + msg406881
2021-11-23 20:05:21iritkatrielsetmessages: + msg406880
2021-11-23 19:56:38iritkatrielsetnosy: + gvanrossum, Mark.Shannon, iritkatriel
2021-11-23 17:03:03rhettingersetnosy: + rhettinger
messages: + msg406861
2021-11-23 15:52:58xuxinhangcreate