Title: A constant folding optimization pass for the AST
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.3
Status: closed Resolution: duplicate
Dependencies: Superseder: AST-level Constant folding
View: 29469
Assigned To: nnorwitz Nosy List: alex, belopolsky, benjamin.peterson, dmalcolm, inada.naoki, jhylton, nnorwitz, pfalcon, rhettinger, sdahlbac, serhiy.storchaka, terry.reedy, thomaslee, titanstar, vstinner
Priority: normal Keywords: patch

Created on 2005-11-02 18:49 by titanstar, last changed 2017-12-14 13:12 by serhiy.storchaka. This issue is now closed.

File name Uploaded Description Edit titanstar, 2005-11-06 20:42 patch for AST constant folding
py3k-ast-optimization-2010-11-05-001.patch dmalcolm, 2010-11-05 23:59
Messages (21)
msg48953 - (view) Author: Rune Holm (titanstar) Date: 2005-11-02 18:49
This patch adds the following: A visitor interface
generalized from the existing ast pass code in order to
make it easy to write ast passes that only care about
specific node types. A constant folding pass that looks
for operations involving number or string literals, and
calculates these at compile time. Example code snippets
that this pass will optimize:

3 + 4 + x => 7 + x

2 ** 2 ** 2 => 16

4 and 5 and x and 6 => x and 6

4 or 5 or x => 4

4 and 5 and ~6 => -7

When combined with patch 1346214, the compiler will
also optimize statements like

if 2**2**2 - 16: expensive_computation() => nothing

The patch adds two new files: Include/optimize.h and
Python.optimize.c. This was done because I anticipate
adding more AST optimizations later using the same
visitor interface, and Python/compile.c is already very
crowded with byte code generation and bytecode
optimization. If new files aren't desired, I could
easily change the pass to add the extra code to compile.c

This patch combined with patch 1346214 passes the unit
tests on all the platforms I've tested it on, namely:
macos 10.3/ppc

valgrind on linux/x86 does not reveal any additional
leaks or uninitialized accesses that aren't already in
the svn head.
msg48954 - (view) Author: Simon Dahlbacka (sdahlbac) Date: 2005-11-06 19:10
Logged In: YES 

the actual patch is missing...
msg48955 - (view) Author: Rune Holm (titanstar) Date: 2005-11-06 20:42
Logged In: YES 

Sorry, I'm new to the sourceforge patch tracker. The patch should be 
attached now.
msg48956 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-02-19 09:41
Logged In: YES 

Neal, what do you think of this?
msg48957 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-02-19 10:09
Logged In: YES 

This should be compared to the constant folding already 
added to Py2.5 via the peepholer:
   dis.dis(compile('x=2+3', '', 'exec'))

Also, make sure it doesn't go over the top consuming 
memory for the likes of:

  '-' * 100

Both of those should not be optimized away at compile-time.

Also, be sure not optimize away -0.0.  Thet is not the 
same as +0.0.  The distinction is important for branch 
cuts in cmath.
msg48958 - (view) Author: Rune Holm (titanstar) Date: 2006-02-19 13:35
Logged In: YES 

It avoids generating constant objects with sizes above 20 (in a similar fashion 
as the bytecode peepholer), and checks whether the operand of unary minus 
is non-zero in order to avoid changing -0.0.

As for the bytecode peephole optimizer, this AST constant folder performs 
quite similar optimizations, but optimizes partially constant and/or and 
comparative expressions in addition. This patch should however not be seen 
as a replacement for the bytecode constant folder, but rather as a 
complement. An optimizing compiler typically contains many forms of 
constant folding in the different phases of compilation, since many later 
optimizations benefit from constant folding (warranting early constant 
folding), and some optimizations might emit code that benefit from constant 
folding again (warranting late constant folding). For an example of the 
former, consider the statement

if 1-1: some_code()

both passes are able to transform this into

if 0: some_code()

but since the AST constant folder is run before the dead code eliminator at 
<>, these two together are able to optimize 
the if statement away altogether.

Note that this patch probably won't apply cleanly anymore, since it was 
written three months ago and the AST code has undergone quite a few 
changes since then. But if there is interest in applying this patch, I'll gladly 
update it for the current trunk.
msg48959 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2006-02-19 17:12
Logged In: YES 

I'm +1 on the idea, but won't have an opportunity to 
review the patch in detail (to check for possible semantic 
changes).  Neal, what do you think?
msg48960 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-05-17 15:54
Logged In: YES 

Candidate for Iceland?
msg48961 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-07-30 17:02
Logged In: YES 

Shoot.  I missed this.  Bumping priority so hopefully I will
remember to take a look at this after 2.5 is out.  We still
need to split up compile.c along the lines of Jeremy's 2006
PyCon presentation (1-2 extra files).  I think one extra
file was for assembly.  I don't remember the details now.
msg64646 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2008-03-28 18:55
Raymond wrote in his recent response on issue2499 (a patch that adds
unary '+' and 'not' folding to peephole optimizer):

 More importantly, we decided that the peepholer is the wrong place to 
do much of this work.  Most of the peepholer is going to be migrated 
up the chain, after the AST is generated, but before the opcodes are 
generated.  That is a faster, more reliable, and more general 
""" (See msg64618.)

This looks like the relevant patch.  I would like to take a look at the
patch, but since it is more than 2 years old, maybe someone has an
updated version.  Please advise.
msg66388 - (view) Author: Thomas Lee (thomaslee) (Python committer) Date: 2008-05-08 02:01
I'm working on the AST optimization code for 2.7 (?) in the
tlee-ast-optimize branch. I've since adopted some of the ideas from this
patch, but I'll take another look when I get a chance. The folding
operation is already mostly in-place.
msg120516 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-11-05 17:34
FWIW, I'm working on fixing up the this patch to work against py3k; I'm assuming there's still interest in the AST visitor + specific optimization passes approach.
msg120541 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-11-05 21:55
David, it would be great if an optional AST optimization pass could do something that we don't already have (perhaps, loop invariant code motion when python is called with -OO or somesuch).  The AST tree makes it possible for the first time to provide some non-trivial optimizations, so aim high.
msg120562 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-11-05 23:59
I've been working on this against the py3k branch.  I'm attaching what I've got so far.

I foolishly didn't check the tlee-ast-optimize branch, instead using file6850 as a base.

Rune Holm/titanstar, I assume you've signed a PSF contributor agreement?

Changes since titanstar's original patch:
  - rework to apply against py3k
  - reformatted tabs to 4-space indentation; tried to reformat to PEP-7 as much as possible
  - added stmt types: With_kind, Nonlocal_kind
  - added expr types: IfExp_kind, Bytes_kind, SetComp_kind, DictComp_kind, Ellipsis_Kind
  - removed Print_kind, Exec_kind, Repr_kind
  - reworked Raise_kind
  - added "col_offset" and "arena" arguments; pass in the PyArena from the compiler as the context of the visitor
  - removal of all "free_expr" and "asdl_seq_free" calls on the assumption that PyArena now handles all of this (am I correct in thinking this?)
  - String -> Bytes in create_ast_from_constant_object
  - added test_optimize selftest suite, though this is based on bytecode disassembly, rather than direct inspection of the AST
  - I've fixed it up so it compiles and passes regrtest, but I suspect I've missed optimization possibilities

I did a little performance testing using the py3k version of the benchmark suite; currently it's a slight regression for some tests, a slight improvement for others; nothing impressive yet.

Thomas Lee's AST optimization branch (branched from r62457) has lots of interesting work:

This appears to not be quite the same starting point; he added a PyCF_NO_OPTIMIZE flag to Include/pythonrun.h (and other places), which seems like a good way to see the effect of the optimization pass.  He also removed the peepholer; maybe it's worth doing that, but it seems worth at least keeping the test suite around to ensure a lack of regressions.

I can look at cherrypicking Thomas' work/porting it to py3k.

Re: "aiming high": I'd love to add new optimizations, but it's not clear to me what's permissable.  In particular, is it permissable for an optimization pass to assume that there are no external modifications to the locals within a frame?

It's possible to write code like this:
    frame = inspect.currentframe()
    inspect.getouterframes(frame)[-depth][0].f_locals[name] = value
to manipulate locals; whether or not this actually affects running code in the current implementation of CPython seems hit-or-miss to me right now, I think depending on exactly when fastlocals get written back to the f_locals dictionary (I could have miswritten the precise code).

By strategically manipulating locals in other frames, we can break pretty-much any typical compiler optimization: locals can appear or change from under us, or change attribute values, or gain side-effects to their __getattr__ (e.g. writing to disk).

If it is permissable for an optimization pass to assume that there are no external modifications to the locals within a frame, then issue 4264 might be one to investigate: this is a patch on top of Tom Lee's work (to do local type-inference to replace list.append with LIST_APPEND).   Ideas for other optimizations would be most welcome.
msg120563 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-11-06 00:10
Another optimization idea: detect local dictionaries that are only ever used in non-mutating ways, and convert them to constants, rather than rebuilding the dict from scratch each time.

See e.g. etc within the bm_html5lib benchmark (though this doesn't seem to be ported to py3k yet)
msg120566 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2010-11-06 01:02
ISTM that you don't need to worry about mutating locals, the locals() function is explicitly documented as not necessarily affecting local variables (not sure if frame objects have the same documentation).

If you want a free optimization opportunity: BINARY_SUBSCR with a list for the first argument who's contents are all constant, this can be optimized to turn it into a tuple which can be load const'd (as we do in the case of an in check), note that this cannot be used in the case of::

    l = [1, 2, 3]

As v.__index__ could theoretically mutate l.
msg131393 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-03-19 05:28
#11549 Rewrite peephole to work on AST
includes constant folding. I have not compared.
msg207130 - (view) Author: Paul Sokolovsky (pfalcon) * Date: 2014-01-01 04:51
8 years after the original patch, there's still no trivial constant folding in bytecode generated (because peephole of course is not a real optimizer to consistently catch all cases):

$ cat 
FOO = 1
BAR = FOO + 2 + 4

$ python --version
Python 2.7.3

$ python -OO -m dis
  1           0 LOAD_CONST               0 (1)
              3 STORE_NAME               0 (FOO)

  2           6 LOAD_NAME                0 (FOO)
              9 LOAD_CONST               1 (2)
             12 BINARY_ADD          
             13 LOAD_CONST               2 (4)
             16 BINARY_ADD          
             17 STORE_NAME               1 (BAR)
             20 LOAD_CONST               3 (None)
             23 RETURN_VALUE        

$ python3.3 --version
Python 3.3.3

$ python3.3 -OO -m dis
  1           0 LOAD_CONST               0 (1) 
              3 STORE_NAME               0 (FOO) 

  2           6 LOAD_NAME                0 (FOO) 
              9 LOAD_CONST               1 (2) 
             12 BINARY_ADD           
             13 LOAD_CONST               2 (4) 
             16 BINARY_ADD           
             17 STORE_NAME               1 (BAR) 
             20 LOAD_CONST               3 (None) 
             23 RETURN_VALUE
msg207202 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-01-03 02:20
My astoptimizer project has an experimental support of constant folding. It works in the same file, or constants can be propagated to other files using config.add_constant('NAME', value).
msg286389 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-01-27 22:27
I applaud any effort to move constant folding out of the peepholer and into AST.
msg308295 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-12-14 13:12
Have been implemented in issue29469.
Date User Action Args
2017-12-14 13:12:04serhiy.storchakasetstatus: open -> closed

superseder: AST-level Constant folding

nosy: + serhiy.storchaka
messages: + msg308295
resolution: duplicate
stage: resolved
2017-01-27 22:27:01rhettingersetmessages: + msg286389
2017-01-27 13:06:04inada.naokisetnosy: + inada.naoki
2016-02-07 18:10:55serhiy.storchakalinkissue26297 superseder
2014-01-03 02:20:13vstinnersetnosy: + vstinner
messages: + msg207202
2014-01-01 04:51:49pfalconsetnosy: + pfalcon
messages: + msg207130
2011-03-19 05:28:54terry.reedysetnosy: + terry.reedy

messages: + msg131393
versions: + Python 3.3, - Python 3.2
2010-11-06 01:02:52alexsetnosy: + alex
messages: + msg120566
2010-11-06 00:10:11dmalcolmsetmessages: + msg120563
2010-11-05 23:59:52dmalcolmsetfiles: + py3k-ast-optimization-2010-11-05-001.patch

messages: + msg120562
2010-11-05 21:55:43rhettingersetmessages: + msg120541
2010-11-05 19:00:21pitrousetnosy: + benjamin.peterson

versions: + Python 3.2, - Python 2.6
2010-11-05 17:34:02dmalcolmsetmessages: + msg120516
2010-10-30 16:19:01georg.brandlsetnosy: - georg.brandl, georg.brandl
2010-10-30 15:12:15dmalcolmsetnosy: + dmalcolm
2009-03-31 16:18:05jhyltonsetpriority: high -> normal
2008-05-08 13:54:15jhyltonsetnosy: + jhylton
2008-05-08 02:01:39thomasleesetnosy: + thomaslee
messages: + msg66388
2008-03-28 18:55:56belopolskysetnosy: + belopolsky
type: performance
messages: + msg64646
2005-11-02 18:49:15titanstarcreate