This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: loop invariant code motion
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, zhangchaospecial
Priority: normal Keywords:

Created on 2021-06-24 15:36 by zhangchaospecial, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg396496 - (view) Author: zcpara (zhangchaospecial) * Date: 2021-06-24 15:36
It is a common scenario to call functions in loop, such as
def foo():
  for i in range(10):
    len("abc")
Currently, we generate bytecode like this;
2           0 SETUP_LOOP              24 (to 26)
              2 LOAD_GLOBAL              0 (range)
              4 LOAD_CONST               1 (10)
              6 CALL_FUNCTION            1
              8 GET_ITER
        >>   10 FOR_ITER                12 (to 24)
             12 STORE_FAST               0 (i)

  3          14 LOAD_GLOBAL              1 (len)
             16 LOAD_CONST               2 ('abc')
             18 CALL_FUNCTION            1
             20 POP_TOP
             22 JUMP_ABSOLUTE           10
        >>   24 POP_BLOCK
        >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALUE
If we can make sure len is a loop invariant, we can generate code as follows. Only one LOAD_GLOBAL len is executed before the loop. LOAD_GLOBAL len is replaced with LOAD_FAST __local__len in the loop.
2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_CONST               1 (10)
              4 CALL_FUNCTION            1
              6 GET_ITER
              8 LOAD_GLOBAL              1 (len)
             10 STORE_FAST               1 (__local__len)
        >>   12 FOR_ITER                 6 (to 26)
             14 STORE_FAST               0 (i)

  3          16 LOAD_FAST                1 (__local__len)
             18 LOAD_CONST               2 ('abc')
             20 CALL_FUNCTION            1
             22 POP_TOP
             24 JUMP_ABSOLUTE            6 (to 12)

  2     >>   26 LOAD_CONST               0 (None)
             28 RETURN_VALU

I prototyped this optimization and gain 4.6% performance improvement on regex_v8 of PyPerformance. Currently, only the global invariants are moved out of the loop. There are some other loop invariants can be moved. Other optimizations such as function inline can provide more opportunities for LICM.
I meet a problem with the following case;
1 def no_pop_blocks():
2    y = 1
3    while not y:
4        bla
5    x = 1

bla is not defined anywhere. With LICM, LOAD_GLOBAL bla will be move out of the loop. Then it executes with an error: bla is not defined.
msg396504 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-06-24 17:00
There are multiple issues with doing 'static' loop invariant code motion in python, but the most obvious one is how would you detect if anything in the body is related to the loop or not? The trivial way that the compiled languages do this optimization is generating some sort of data dependency graph, and seeing whether anything in an expression refers to the loop target though in Python you can't know for sure unless you know every possible type in the loop body and the type of the loop target which is something nearly impossible to infer (due to the dynamic nature of python). You could theoretically do this only if the whole loop only uses simple values:

for x in [1, 2, 3, 4, 5]:
    a = 5
    if a > 3:
        a += 2
    continue

and in that case, it is better to just optimize the whole loop away but since this kind of code doesn't exist at all in the public, there is no need to make the compiler more complicated.

Another option is generating multiple code objects with guards, which would require to do first some sort of partial type inference and generate the code for the possible types. It would require a substantial amount of changes on both the compiler and the interpreter which is something I'd say can be deferred for later if we implement a tracing JIT. With traces, it should be way easier to do this sort of trivial compiler optimization (loop invariants, constant propagation, GVN, etc.) since we would know the types, and possible code paths.
History
Date User Action Args
2022-04-11 14:59:47adminsetgithub: 88671
2021-06-24 17:00:37BTaskayasetstatus: open -> closed
resolution: not a bug
stage: resolved
2021-06-24 17:00:04BTaskayasetnosy: + BTaskaya
messages: + msg396504
2021-06-24 15:36:41zhangchaospecialcreate