Message 385762 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	xmorel
Recipients	docs@python, xmorel
Date	2021-01-27.11:39:47
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1611747588.28.0.967739305084.issue43036@roundup.psfhosted.org>
In-reply-to

Content
I was looking at the disassembly of a fairly straightforward listcomp: [e for e in s if e[0]] 1 0 BUILD_LIST 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 16 (to 22) 6 STORE_FAST 1 (e) 8 LOAD_FAST 1 (e) 10 LOAD_CONST 0 (0) 12 BINARY_SUBSCR 14 POP_JUMP_IF_FALSE 4 16 LOAD_FAST 1 (e) 18 LIST_APPEND 2 20 JUMP_ABSOLUTE 4 >> 22 RETURN_VALUE 6, 8 bothered me because STORE_FAST is documented as > Stores TOS into the local co_varnames[var_num]. So it seems like it leaves TOS and thus the LOAD is unnecessary, However looking at ceval.c: case TARGET(STORE_FAST): { PREDICTED(STORE_FAST); PyObject value = POP(); SETLOCAL(oparg, value); FAST_DISPATCH(); } so STORE_FAST does pop the TOS and the LOAD_FAST is necessary. This is confusing because there are instructions which literally have POP in their name whose stack behaviour is documented explicitly. Should all bytecode instructions have their stack behaviour explicitly documented, or only those with an odd* stack behaviour (e.g. JUMP_IF_FALSE_OR_POP) and the rest maybe covered by a note saying that they will pop their parameters and push back their result or somesuch? -- Furthermore, maybe optimising `STORE_LOCAL a; LOAD_LOCAL a` to `DUP_TOP; STORE_LOCAL a` would be useful? It obviously would have no effect on bytecode size since wordcode, and `fastlocals[i]` would be in cache and the conditional check likely predicted, but it seems like skipping them entirely would still be more reliable? This idea is somewhat supported by expression assignments already generating the latter: >>> @dis.dis ... def foo(): ... if a := thing(): ... do(a) ... 3 0 LOAD_GLOBAL 0 (thing) 2 CALL_FUNCTION 0 4 DUP_TOP 6 STORE_FAST 0 (a) 8 POP_JUMP_IF_FALSE 18 4 10 LOAD_GLOBAL 1 (do) 12 LOAD_FAST 0 (a) 14 CALL_FUNCTION 1 16 POP_TOP >> 18 LOAD_CONST 0 (None) 20 RETURN_VALUE This optimisation would also trigger for e.g. [x[i] for x in xs] or a = foo() if a: # do thing making the latter generate bytecode identical to walrus assignments at least for the trivial case: currently the only difference (aside from line numbers) is that the normal assignment generates STORE_FAST;LOAD_FAST while expression assignments generate DUP_TOP;STORE_FAST.

I was looking at the disassembly of a fairly straightforward listcomp:

    [e for e in s if e[0]]

  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                16 (to 22)
              6 STORE_FAST               1 (e)
              8 LOAD_FAST                1 (e)
             10 LOAD_CONST               0 (0)
             12 BINARY_SUBSCR
             14 POP_JUMP_IF_FALSE        4
             16 LOAD_FAST                1 (e)
             18 LIST_APPEND              2
             20 JUMP_ABSOLUTE            4
        >>   22 RETURN_VALUE

6, 8 bothered me because STORE_FAST is documented as

> Stores TOS into the local co_varnames[var_num].

So it seems like it leaves TOS and thus the LOAD is unnecessary, However looking at ceval.c:

        case TARGET(STORE_FAST): {
            PREDICTED(STORE_FAST);
            PyObject *value = POP();
            SETLOCAL(oparg, value);
            FAST_DISPATCH();
        }

so STORE_FAST does pop the TOS and the LOAD_FAST is necessary. This is confusing because there are instructions which literally have POP in their name whose stack behaviour is documented explicitly.

Should all bytecode instructions have their stack behaviour explicitly documented, or only those with an *odd* stack behaviour (e.g. JUMP_IF_FALSE_OR_POP) and the rest maybe covered by a note saying that they will pop their parameters and push back their result or somesuch?

--

Furthermore, maybe optimising `STORE_LOCAL a; LOAD_LOCAL a` to `DUP_TOP; STORE_LOCAL a` would be useful? It obviously would have no effect on bytecode size since wordcode, and `fastlocals[i]` would be in cache and the conditional check likely predicted, but it seems like skipping them entirely would still be more reliable? This idea is somewhat supported by expression assignments already generating the latter:

    >>> @dis.dis
    ... def foo():
    ...     if a := thing():
    ...         do(a)
    ... 
  3           0 LOAD_GLOBAL              0 (thing)
              2 CALL_FUNCTION            0
              4 DUP_TOP
              6 STORE_FAST               0 (a)
              8 POP_JUMP_IF_FALSE       18

  4          10 LOAD_GLOBAL              1 (do)
             12 LOAD_FAST                0 (a)
             14 CALL_FUNCTION            1
             16 POP_TOP
        >>   18 LOAD_CONST               0 (None)
             20 RETURN_VALUE


This optimisation would also trigger for e.g.

    [x[i] for x in xs]

or

    a = foo()
    if a:
        # do thing

making the latter generate bytecode identical to walrus assignments at least for the trivial case: currently the only difference (aside from line numbers) is that the normal assignment generates STORE_FAST;LOAD_FAST while expression assignments generate DUP_TOP;STORE_FAST.

History
Date	User	Action	Args
2021-01-27 11:39:48	xmorel	set	recipients: + xmorel, docs@python
2021-01-27 11:39:48	xmorel	set	messageid: <1611747588.28.0.967739305084.issue43036@roundup.psfhosted.org>
2021-01-27 11:39:48	xmorel	link	issue43036 messages
2021-01-27 11:39:47	xmorel	create