classification
Title: Optimize sequences of constants in the compiler
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, benjamin.peterson, brett.cannon, methane, ncoghlan, pitrou, rhettinger, serhiy.storchaka, yselivanov
Priority: normal Keywords: patch

Created on 2018-04-21 12:09 by serhiy.storchaka, last changed 2019-09-30 16:57 by brandtbucher. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 6559 closed serhiy.storchaka, 2018-04-21 12:14
PR 16498 closed brandtbucher, 2019-09-30 16:57
Messages (3)
msg315563 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-21 12:09
The following PR makes three optimizations in the compiler.

1. A sequence of several LOAD_CONSTs is replaced with a single LOAD_CONST followed by UNPACK_SEQUENCE.

For example, "{'a': 1, 'b': 2, 'c': 3}" is currently compiled to

  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (3)
              6 LOAD_CONST               3 (('a', 'b', 'c'))
              8 BUILD_CONST_KEY_MAP      3
             10 POP_TOP
             12 LOAD_CONST               4 (None)
             14 RETURN_VALUE

With this optimization it will be compiled to:

  1           0 LOAD_CONST               5 ((('a', 'b', 'c'), 3, 2, 1))
              2 UNPACK_SEQUENCE          4
              4 BUILD_CONST_KEY_MAP      3
              6 POP_TOP
              8 LOAD_CONST               4 (None)
             10 RETURN_VALUE

2. Optimized building lists and sets of constants. [1, 2, 3, 4, 5] will be compiled to [*(1, 2, 3, 4, 5)], and {1, 2, 3, 4, 5} will be compiled to {*frozenset(1, 2, 3, 4, 5)}, where (1, 2, 3, 4, 5) and frozenset(1, 2, 3, 4, 5) are just constants.

x = [1, 2, 3, 4, 5]
y = {1, 2, 3, 4, 5}

currently is compiled to

  1           0 LOAD_CONST               0 (1)
              2 LOAD_CONST               1 (2)
              4 LOAD_CONST               2 (3)
              6 LOAD_CONST               3 (4)
              8 LOAD_CONST               4 (5)
             10 BUILD_LIST               5
             12 STORE_NAME               0 (x)

  2          14 LOAD_CONST               0 (1)
             16 LOAD_CONST               1 (2)
             18 LOAD_CONST               2 (3)
             20 LOAD_CONST               3 (4)
             22 LOAD_CONST               4 (5)
             24 BUILD_SET                5
             26 STORE_NAME               1 (y)
             28 LOAD_CONST               5 (None)
             30 RETURN_VALUE

With optimization 1 it will be compiled to

  1           0 LOAD_CONST               6 ((5, 4, 3, 2, 1))
              2 UNPACK_SEQUENCE          5
              4 BUILD_LIST               5
              6 STORE_NAME               0 (x)

  2           8 LOAD_CONST               6 ((5, 4, 3, 2, 1))
             10 UNPACK_SEQUENCE          5
             12 BUILD_SET                5
             14 STORE_NAME               1 (y)
             16 LOAD_CONST               5 (None)
             18 RETURN_VALUE

And with optimization 2 it will be compiled to

  1           0 LOAD_CONST               0 ((1, 2, 3, 4, 5))
              2 BUILD_LIST_UNPACK        1
              4 STORE_NAME               0 (x)

  2           6 LOAD_CONST               1 (frozenset({1, 2, 3, 4, 5}))
              8 BUILD_SET_UNPACK         1
             10 STORE_NAME               1 (y)
             12 LOAD_CONST               2 (None)
             14 RETURN_VALUE

3. Remove unused constants.

After folding tuples of constants created at code generation level, eliminating unreachable code, and after the above two optimizations, unused constants are left in the co_consts tuple. The third optimization removes them and reenumerate constants in the order of occurrence. The above example will be compiled to:

  1           0 LOAD_CONST               0 ((1, 2, 3, 4, 5))
              2 BUILD_LIST_UNPACK        1
              4 STORE_NAME               0 (x)

  2           6 LOAD_CONST               1 (frozenset({1, 2, 3, 4, 5}))
              8 BUILD_SET_UNPACK         1
             10 STORE_NAME               1 (y)
             12 LOAD_CONST               2 (None)
             14 RETURN_VALUE

See issue28813 for the implementation of this optimization on the level of raw bytecode (in peephole.c). 

These optimizations are useful not only for initializing collections of constants. Calling function with constant arguments "f(x, a=1, b=2)":

Current:
  1           0 LOAD_NAME                0 (f)
              2 LOAD_NAME                1 (x)
              4 LOAD_CONST               0 (None)
              6 LOAD_CONST               1 (1)
              8 LOAD_CONST               2 (2)
             10 LOAD_CONST               3 (('a', 'b'))
             12 CALL_FUNCTION_KW         4
             14 POP_TOP
             16 LOAD_CONST               0 (None)
             18 RETURN_VALUE

Optimized:
  1           0 LOAD_NAME                0 (f)
              2 LOAD_NAME                1 (x)
              4 LOAD_CONST               0 ((('a', 'b'), 2, 1, None))
              6 UNPACK_SEQUENCE          4
              8 CALL_FUNCTION_KW         4
             10 POP_TOP
             12 LOAD_CONST               1 (None)
             14 RETURN_VALUE

This issue depends on issue33318.
msg315633 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-04-22 23:42
There seems to be an implicit assumption here that fewer bytecodes is better. But that isn't always the case.

Do you have evidence that the sequence 
0 LOAD_CONST               5 ((('a', 'b', 'c'), 3, 2, 1))
2 UNPACK_SEQUENCE          4
is actually faster than
0 LOAD_CONST               0 (1)
2 LOAD_CONST               1 (2)
4 LOAD_CONST               2 (3)
6 LOAD_CONST               3 (('a', 'b', 'c'))
?

The second sequence has more bytecodes, but the first has to create a new object.

I think you ought to be careful using the word "optimize" unless the output is incontrovertibly superior.
msg330141 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-11-20 18:31
Good point! Thank you Mark for your comment.

In theory, fewer bytecodes is better (the first example doesn't create a new object, it operates with references to existing objects). But in practice the difference is very small and overwhelmed by random factors. We need hundreds or thousands of constants to get a stable result. This is far from a common use case.
History
Date User Action Args
2019-09-30 16:57:52brandtbuchersetpull_requests: + pull_request16086
2018-11-20 18:31:42serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg330141

stage: patch review -> resolved
2018-04-22 23:42:12Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg315633
2018-04-21 12:14:13serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request6254
2018-04-21 12:09:20serhiy.storchakacreate