classification
Title: PEP 590 implementation may have introduced a performance regression
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, jdemeyer, pablogsal, petr.viktorin, rhettinger, vstinner
Priority: high Keywords: 3.8regression, patch

Created on 2019-07-11 11:26 by pablogsal, last changed 2019-08-05 10:55 by pablogsal. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 14735 open Mark.Shannon, 2019-07-13 10:33
PR 14782 merged jdemeyer, 2019-07-15 12:46
Messages (11)
msg347672 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-07-11 11:26
Many functions are significantly slower after commit aacc77fbd77640a8f03638216fa09372cc21673d was applied:

aacc77fbd77640a8f03638216fa09372cc21673d is the first bad commit
commit aacc77fbd77640a8f03638216fa09372cc21673d
Author: Jeroen Demeyer <J.Demeyer@UGent.be>
Date:   Wed May 29 20:31:52 2019 +0200

    bpo-36974: implement PEP 590 (GH-13185)


    Co-authored-by: Jeroen Demeyer <J.Demeyer@UGent.be>
    Co-authored-by: Mark Shannon <mark@hotpy.org>


Experiment in 3.8 
-----------------

./python -m perf timeit -s '_len=len' '_len("hello")'
.....................
Mean +- std dev: 157 ns +- 3 ns


Experiment in parent of aacc77fbd77640a8f03638216fa09372cc21673d
----------------------------------------------------------------

python -m perf timeit -s '_len=len' '_len("hello")'
.....................
Mean +- std dev: 133 ns +- 3 ns


The same regression happens consistently in many other functions (hex, len, deque methods....)
msg347673 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-07-11 11:28
I have bisected again and I always get commit aacc77fbd77640a8f03638216fa09372cc21673d but it would be great if someone can confirm these results independently.
msg347674 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-07-11 11:45
This is the stack difference for my previous experiment:

   * frame #0: 0x00000001001578ee python.exe`unicode_length(self=0x000000010140d220) at unicodeobject.c:12231:9
    frame #1: 0x000000010000b453 python.exe`PyObject_Size(o=0x000000010140d220) at abstract.c:57:26
    frame #2: 0x00000001001cea29 python.exe`builtin_len(module=0x00000001006ad280, obj=0x000000010140d220) at bltinmodule.c:1548:11
    frame #3: 0x0000000100044518 python.exe`_PyMethodDef_RawFastCallKeywords(method=0x00000001003e5540, self=0x00000001006ad280, args=0x00000001013f0e10, nargs=1, kwnames=0x0000000000000000) at call.c:656:18
    frame #4: 0x000000010004487a python.exe`_PyCFunction_FastCallKeywords(func=0x00000001006af210, args=0x00000001013f0e10, nargsf=9223372036854775809, kwnames=0x0000000000000000) at call.c:743:14
    frame #5: 0x00000001001edc09 python.exe`_PyObject_Vectorcall(callable=0x00000001006af210, args=0x00000001013f0e10, nargsf=9223372036854775809, kwnames=0x0000000000000000) at abstract.h:128:21
    frame #6: 0x00000001001eaa6a python.exe`call_function(tstate=0x0000000101103910, pp_stack=0x00007ffeefbfdc50, oparg=1, kwnames=0x0000000000000000) at ceval.c:4859:13
    frame #7: 0x00000001001e5436 python.exe`_PyEval_EvalFrameDefault(f=0x00000001013f0c90, throwflag=0) at ceval.c:3356:19


-------


  * frame #0: 0x0000000100153f7e python.exe`unicode_length(self=0x0000000101526580) at unicodeobject.c:12231:9
    frame #1: 0x000000010000a623 python.exe`PyObject_Size(o=0x0000000101526580) at abstract.c:57:26
    frame #2: 0x00000001001cab69 python.exe`builtin_len(module=0x00000001006a4280, obj=0x0000000101526580) at bltinmodule.c:1548:11
    frame #3: 0x0000000100042d28 python.exe`_PyMethodDef_RawFastCallKeywords(method=0x00000001003deff0, self=0x00000001006a4280, args=0x000000010150fe10, nargs=1, kwnames=0x0000000000000000) at call.c:650:18
    frame #4: 0x000000010004144d python.exe`_PyCFunction_FastCallKeywords(func=0x00000001006a6210, args=0x000000010150fe10, nargs=1, kwnames=0x0000000000000000) at call.c:736:14
    frame #5: 0x00000001001e6c5e python.exe`call_function(tstate=0x0000000101103910, pp_stack=0x00007ffeefbfdc40, oparg=1, kwnames=0x0000000000000000) at ceval.c:4825:9
    frame #6: 0x00000001001e1561 python.exe`_PyEval_EvalFrameDefault(f=0x000000010150fc90, throwflag=0) at ceval.c:3356:19
msg347675 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-07-11 12:40
Updated stack for the tip of 3.8:

  * frame #0: 0x000000010015854e python.exe`unicode_length(self=0x00000001014a4040) at unicodeobject.c:12235:9
    frame #1: 0x000000010000a913 python.exe`PyObject_Size(o=0x00000001014a4040) at abstract.c:57:26
    frame #2: 0x00000001001cf709 python.exe`builtin_len(module=0x00000001006b22f0, obj=0x00000001014a4040) at bltinmodule.c:1551:11
    frame #3: 0x0000000100043fd8 python.exe`_PyMethodDef_RawFastCallKeywords(method=0x00000001003e8ac0, self=0x00000001006b22f0, args=0x00000001013f5e10, nargs=1, kwnames=0x0000000000000000) at call.c:650:18
    frame #4: 0x000000010004433a python.exe`_PyCFunction_Vectorcall(func=0x00000001006b4280, args=0x00000001013f5e10, nargsf=9223372036854775809, kwnames=0x0000000000000000) at call.c:737:14
    frame #5: 0x00000001001eec89 python.exe`_PyObject_Vectorcall(callable=0x00000001006b4280, args=0x00000001013f5e10, nargsf=9223372036854775809, kwnames=0x0000000000000000) at abstract.h:127:11
    frame #6: 0x00000001001ebc2a python.exe`call_function(tstate=0x0000000101103950, pp_stack=0x00007ffeefbfdc30, oparg=1, kwnames=0x0000000000000000) at ceval.c:4987:13
    frame #7: 0x00000001001e65ef python.exe`_PyEval_EvalFrameDefault(f=0x00000001013f5c90, throwflag=0) at ceval.c:3500:19
msg347707 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-07-11 21:03
Could you please specify:
- which commits are you comparing exactly? From your explanation, I guess aacc77fbd and its parent, but that's not completely fair since PEP 590 consists of many commits (see #36974). A better comparison would be master against 3.8a4.
- are you compiling with PGO (--enable-optimizations) or not?
msg347715 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2019-07-12 00:35
>which commits are you comparing exactly? From your explanation, I guess aacc77fbd and its parent, but that's not completely fair since PEP 590 consists of many commits (see #36974). A better comparison would be master against 3.8a4.

I am comparing the tip of 3.8 and the parent of aacc77fbd. (Notice that the first two stack traces correspond to aacc77fbd and its parent).

>are you compiling with PGO (--enable-optimizations) or not?

Yup, PGO and LTO. I can try repeat them without optimisations if you want.

As I suggested previously I would like to confirm that this results can be reproduced first before exploring other things.
msg347735 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-07-12 09:41
I will certainly have a look and try a few things, but it will probably be next week.
msg347956 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-07-15 10:44
I did some benchmarks WITHOUT PGO (simply because it's much faster to compile and therefore easier to test things out).

The command I used for testing is
./python -m perf timeit --duplicate 200 -s 'f = len; x = ()' 'f(x)'

* d30da5dd9a8a965cf24a22bbaff8a5b1341c2944 (before PEP 590)
  Mean +- std dev: 25.1 ns +- 0.2 ns
* aacc77fbd77640a8f03638216fa09372cc21673d (first commit of PEP 590)
  Mean +- std dev: 29.3 ns +- 0.9 ns
* b8e198a5d09ca876b87baaf6efd2b2e7c9e3a0b3 (3.8 branch)
  Mean +- std dev: 29.5 ns +- 0.3 ns
* cd6e83b4810549c308ab2d7315dbab526e35ccf6 (master)
  Mean +- std dev: 25.1 ns +- 0.2 ns

So there does seem to be a regression with 3.8 but not with 3.9. I don't know what to do with this... can we backport more of the PEP 590 commits to 3.8? Or will the release manager complain?
msg347957 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-15 10:46
> can we backport more of the PEP 590 commits to 3.8? Or will the release manager complain?

I would opt for a case by case discussion. Would you be able to identify which master commits have the most significant impact on performance for builtin functions?
msg347959 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-07-15 12:51
PR 14782 (backport of PR 13781) fixes the regression for me.
msg349044 - (view) Author: Jeroen Demeyer (jdemeyer) * (Python triager) Date: 2019-08-05 10:20
Please close
History
Date User Action Args
2019-08-05 10:55:12pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-08-05 10:20:22jdemeyersetmessages: + msg349044
2019-07-15 12:51:08jdemeyersetmessages: + msg347959
2019-07-15 12:48:09jdemeyersetversions: - Python 3.9
2019-07-15 12:46:24jdemeyersetpull_requests: + pull_request14579
2019-07-15 10:46:53vstinnersetnosy: + vstinner
messages: + msg347957
2019-07-15 10:44:59jdemeyersetmessages: + msg347956
2019-07-13 10:33:32Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request14529
2019-07-12 09:41:45jdemeyersetmessages: + msg347735
2019-07-12 00:38:35pablogsalsettitle: PEP 590 implementation introduced a performance regression -> PEP 590 implementation may have introduced a performance regression
2019-07-12 00:35:42pablogsalsetmessages: + msg347715
2019-07-11 21:03:13jdemeyersetmessages: + msg347707
2019-07-11 12:40:57pablogsalsetmessages: + msg347675
2019-07-11 11:45:25pablogsalsetmessages: + msg347674
2019-07-11 11:28:44pablogsalsetmessages: + msg347673
2019-07-11 11:26:56pablogsalcreate