classification
Title: Possible wordcode optimization for STORE/LOAD pairs
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-10-05 22:23 by rhettinger, last changed 2020-12-22 02:53 by rhettinger. This issue is now closed.

Messages (2)
msg354024 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-10-05 22:23
In the show code below, the STORE_FAST x is FOLLOWED by LOAD_FAST x.   This is a common word code pairing.  Perhaps a new combined opcode would help:

        case TARGET(LOAD_AND_STORE_FAST): {
            PyObject *value = GETLOCAL(oparg);
            if (value == NULL) {
                format_exc_check_arg(tstate, PyExc_UnboundLocalError,
                                     UNBOUNDLOCAL_ERROR_MSG,
                                     PyTuple_GetItem(co->co_varnames, oparg));
                goto error;
            }
            Py_INCREF(value);
            SETLOCAL(oparg, value);
            FAST_DISPATCH();
        }

The combined opcode saves one one trip around the eval-loop and it saves
unnecessary stack manipulations (a PUSH() immediately followed by a POP()).

The code generation would likely need to be a compiler or AST step because it crosses basic block boundaries.  Care would need to be taken to not adversely affect tracing the code in a debugger.

Note in the following code, the "x" is never used after the STORE/LOAD pair.  In theory, the two opcodes could be dropped entirely; however, would affect a call to "locals()".

-------- Code disassembly ------

>>> def f(s):
	for x in g:
		yield x**2

		
>>> dis(f)
  2           0 LOAD_GLOBAL              0 (g)
              2 GET_ITER
        >>    4 FOR_ITER                14 (to 20)
              6 STORE_FAST               1 (x)

  3           8 LOAD_FAST                1 (x)
             10 LOAD_CONST               1 (2)
             12 BINARY_POWER
             14 YIELD_VALUE
             16 POP_TOP
             18 JUMP_ABSOLUTE            4
        >>   20 LOAD_CONST               0 (None)
             22 RETURN_VALUE
msg354031 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-06 07:01
I thought about this. But STORE_FAST x is followed by LOAD_FAST x just by accident. If you change the expression (1+x**2 or f(x**2)) they no longer be neighbors. I am not sure this pattern is common enough. There are more common pairs.

Also, note that LOAD_FAST belong to the different line of code. It should be preserved for debugging purpose. Otherwise you could not set a breakpoint on the first line of the loop body.
History
Date User Action Args
2020-12-22 02:53:57rhettingersetstatus: open -> closed
stage: resolved
2019-10-06 07:01:27serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg354031
2019-10-05 22:23:55rhettingercreate