This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neonene
Recipients Mark.Shannon, malin, neonene, paul.moore, rhettinger, steve.dower, tim.golden, vstinner, zach.ware
Date 2021-09-13.23:37:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1631576267.96.0.237615972116.issue45116@roundup.psfhosted.org>
In-reply-to
Content
With msvc 16.10.3 and 16.11.2 (latest),
PR25244 told me the amount of code in _PyEval_EvalFrameDefault() is over the limit of PGO.
In the old version of _PyEval_EvalFrameDefault (b98eba5), the same issue can be caused adding any-code anywhere with more than 20 expressions/statements. For example, at the top/middle/end of the function, repeating "if (0) {}" 10times, or "if (0) {19 statements}". As for python3.9.7, more than 800 expressions/statements.

Here is just a workaround for 3.10rc2 on windows.
==================================================
--- Python/ceval.c
+++ Python/ceval.c
@@ -1306,9 +1306 @@
-#define DISPATCH() \
-    { \
-        if (trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE) { \
-            goto tracing_dispatch; \
-        } \
-        f->f_lasti = INSTR_OFFSET(); \
-        NEXTOPARG(); \
-        DISPATCH_GOTO(); \
-    }
+#define DISPATCH() goto tracing_dispatch
@@ -1782,4 +1774,9 @@
     tracing_dispatch:
     {
+        if (!(trace_info.cframe.use_tracing OR_DTRACE_LINE OR_LLTRACE)) {
+            f->f_lasti = INSTR_OFFSET();
+            NEXTOPARG();
+            DISPATCH_GOTO();
+        }
         int instr_prev = f->f_lasti;
         f->f_lasti = INSTR_OFFSET();
==================================================

This patch becomes ineffective just adding one expression to DISPATCH macro as below

   #define DISPATCH() {if (1) goto tracing_dispatch;}

And this approach is not sufficient for 3.11 with bigger eval-func.
I don't know a cl/link option to lift such restriction of function size.


3.10rc2 x86 pgo : 1.00
        patched : 1.09x faster (slower  5, faster 48, not significant 5)

3.10rc2 x64 pgo : 1.00         (roughly the same speed as official bin)
        patched : 1.07x faster (slower  5, faster 47, not significant 6)
  patched(/Ob3) : 1.07x faster (slower  7, faster 45, not significant 6)

x64 results are posted.

Fixing inlining rejection also made __forceinline buildable with normal processing time and memory usage.
History
Date User Action Args
2021-09-13 23:37:48neonenesetrecipients: + neonene, rhettinger, paul.moore, vstinner, tim.golden, Mark.Shannon, zach.ware, steve.dower, malin
2021-09-13 23:37:47neonenesetmessageid: <1631576267.96.0.237615972116.issue45116@roundup.psfhosted.org>
2021-09-13 23:37:47neonenelinkissue45116 messages
2021-09-13 23:37:47neonenecreate