This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: possible optimization: SHRINK_STACK(n)
Type: Stage:
Components: Versions:
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, pablogsal, rhettinger, serhiy.storchaka, shrink_stack
Priority: normal Keywords:

Created on 2020-06-14 12:08 by shrink_stack, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg371501 - (view) Author: SHRINK_STACK (shrink_stack) Date: 2020-06-14 12:08
context managers and except blocks generates multiple POP_TOPS constantly and maybe other cases which might lead generation of multiple POP_TOPS. A SHRINK_STACK(n) opcode would make things better (improvement on pyc size, less opcodes = faster evaluation). 

A possible patch:

(to peephole.c)
+
+            case POP_TOP:
+                h = i + 1;
+                while (h < codelen && _Py_OPCODE(codestr[h]) == POP_TOP) {
+                    h++;
+                }
+                if (h > i + 1) {
+                    codestr[i] = PACKOPARG(SHRINK_STACK, h - i);
+                    fill_nops(codestr, i + 1, h);
+                    nexti = h;
+                }
+                break;


(to ceval.c)
 
+        case TARGET(SHRINK_STACK): {
+            for (int i = 0; i < oparg; i++) {
+                PyObject *value = POP();
+                Py_DECREF(value);
+            }
+            FAST_DISPATCH();
+        }
+

and some other minor things for opcode.py and magic number
msg371511 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-06-14 17:23
> less opcodes = faster evaluation

Unfortunately, that is not always true as opcodes can have arbitrary complexity and there are very low-level effects that are relevant in the eval loop. Even if it is better, the improvement may not be worth burning another opcode, especially since the new opcode won't replace POP_TOP (so we need to deal with both).

Without evaluating the tradeoffs and how it plays into the current status quo I have some initial questions:

- What is the performance improvement of the patch that you propose? Could you run the pyperformance benchmark suite to have some numbers? 

- How many opcodes less are we talking about? What is the size before and after the suggested change in the stdlib pyc files?
msg371512 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-06-14 17:50
It increases the size of the eval loop. It can break optimizations which is performed by the C compiler only when the code size does not exceed some limits, and it can exceed the size of processor caches which can make execution less efficient. So there are possible negative effects. We should have evidence that this change actually improves performance.
History
Date User Action Args
2022-04-11 14:59:32adminsetgithub: 85146
2020-06-14 17:51:00serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg371512
2020-06-14 17:29:50rhettingersetnosy: + rhettinger
2020-06-14 17:23:51pablogsalsetmessages: + msg371511
2020-06-14 14:46:05BTaskayasetnosy: + Mark.Shannon, pablogsal
2020-06-14 12:08:18shrink_stackcreate