classification
Title: Patch: optimize code to use LIST_APPEND instead of calling list.append
Type: enhancement Stage:
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BreamoreBoy, nnorwitz, novalis_dt, rhettinger, thomaslee
Priority: normal Keywords: patch

Created on 2008-11-05 19:30 by novalis_dt, last changed 2014-07-28 21:05 by BreamoreBoy.

Files
File name Uploaded Description Edit
tlee-ast-optimize-appends.diff novalis_dt, 2008-11-05 19:30 review
tlee-ast-optimize-appends-2.diff novalis_dt, 2008-11-06 22:29 Revised
tlee-ast-optimize-appends-3.diff novalis_dt, 2008-11-06 23:27
Messages (9)
msg75525 - (view) Author: David Turner (novalis_dt) Date: 2008-11-05 19:30
This is a patch to Tom Lee's AST optimization branch.

Python bytecode includes at least one operation which is not directly
accessible from Python code: LIST_APPEND, which pops a value and a list
off the stack and appends the value to the list. This is used in
generating code for list comprehensions:

[x for x in somelist]

generates the following code for the body of the loop:

...
LOAD_FAST 1#a local is generated to hold the new list
LOAD_FAST 2 #the iteration variable, x
LIST_APPEND
...

Whereas if you were to try to write the comprehension directly, you
would get:

LOAD_FAST 1
LOAD_ATTR 0 #an index into the constant table: “append”
LOAD_FAST 2
CALL_FUNCTION 1 #remove x and the append function from the top of the
stack and push the result of the call
POP_TOP

This is much slower. In part, it’s the cost of doing the attribute
lookup each time, which is why it’s common to see code like a = [] ap =
a.append for x in …..: ap(x + …) return a But the function call is
naturally slower than the simpler LIST_APPEND operation. The attached
patch tries to determine statically if a local is all circumstances
holds a list, and if so, generates LIST_APPEND whenever user code would
call local.append.  It’s not perfect — in particular, I could track
local types on a more fine-grained level than per-function. But it’s a
start.
msg75547 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2008-11-06 07:45
Interesting approach.  I was surprised to see the change to the AST, but
I understand why you did it.  I think if the AST optimization approach
works out well, we will want to have some more general mechanism to
communicate these optimization (or other!) hints to the compiler.  It
would suck to have to modify the AST each time we wanted to add a new
optimization.  You and Tom might have better ideas for how best to
achieve that.

I'll make some comments that would need to be addressed, but you might
want to wait making any changes depending on what approach you decide to
take.

The most important missing piece is tests to show that the new code works.

There are various whitespace issues.  Both whitespace only changes that
should be avoided and indentation inconsistencies with the existing code
(or so it appears).  The latter could be the way I'm viewing the file or
existing problems with tabs.

In Python/optimize.c, you need to handle the case where the PyDict_New()
calls fail.  It looks like currently an undetected error can happen in
during construction.  And on destruction it will crash because the
objects will be NULL when calling Py_DECREF.

All calls like PyDict_SetItem(), PyInt_FromLong(), etc need to handle
errors.

I'll need to study the code a lot more to see how well it behaves. 
Tests would help a lot with that.
msg75584 - (view) Author: David Turner (novalis_dt) Date: 2008-11-06 22:29
Neal Norwitz <nnorwitz@gmail.com> added the comment:
> 
> Interesting approach.  I was surprised to see the change to the AST, 
> but I understand why you did it.  I think if the AST optimization 
> approach works out well, we will want to have some more general 
> mechanism to communicate these optimization (or other!) hints to the 
> compiler.  It would suck to have to modify the AST each time we 
> wanted to add a new optimization.  You and Tom might have better 
> ideas for how best to achieve that.

I really didn't want to have to change the AST.  The problem was that
there was a feature of the Python bytecode which was not representable
in Python source code.  I don't anticipate future optimizations
requiring changes to the AST, because I now believe every important
feature of the bytecode is representable in the AST.  The one exception
might be if we have a need for arbitrary jumps.  I don't think that
would be useful, but it might conceivably.  In that case, we would need
Goto and Label AST nodes.  

The thing that struck me as ugly was the idea that there's one object
(the optimizer) that knows everything about all optimization operations.
 This seems like a great case for the visitor pattern.  The optimizer
ought to have a list of functions to call for each type of AST node,
each with some private storage space.  But I didn't want to make that
kind of dramatic change without consulting with Tom.

> I'll make some comments that would need to be addressed, but you might
> want to wait making any changes depending on what approach you decide 
> to take.
> 
> The most important missing piece is tests to show that the new code 
> works.

> There are various whitespace issues.  Both whitespace only changes 
> that should be avoided and indentation inconsistencies with the 
> existing code (or so it appears).  The latter could be the way I'm 
> viewing the file or existing problems with tabs.

Fixed, I think.  The original code appears to be somewhat inconsistent
between files in its uses of spaces/tabs, but I think I have now matched
the style of each file.

> In Python/optimize.c, you need to handle the case where the 
> PyDict_New() calls fail.  It looks like currently an undetected error
> can happen during construction.  And on destruction it will crash 
> because the objects will be NULL when calling Py_DECREF.
 
Fixed.  

> All calls like PyDict_SetItem(), PyInt_FromLong(), etc need to handle
> errors.

I'm not exactly sure which "etc" need to handle errors.  Py*_As*? 
Anyway, I changed the ones you mentioned.

> I'll need to study the code a lot more to see how well it behaves. 
> Tests would help a lot with that.

I've added a few.
msg75585 - (view) Author: David Turner (novalis_dt) Date: 2008-11-06 23:27
Actually, I just noticed a case where the code would do the wrong thing.
 I fixed it and added a test for it.
msg75587 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-07 00:28
> I really didn't want to have to change the AST.  The problem was that
> there was a feature of the Python bytecode which was not representable
> in Python source code. 

FWIW, I see exposing bytecodes as an anti-pattern that locks in a
particular implementation in a way that makes it hard to change and hard
to port to other Python implementations.  The current bound method
approach works fine.  Don't really see enough payoff to justify the
extra code and maintenance.
msg75598 - (view) Author: Thomas Lee (thomaslee) (Python committer) Date: 2008-11-07 07:34
Neal said:
> I was surprised to see the change to the AST, but I understand why you
did it.

The problems David ran into here sound like an argument for arbitrary
AST annotations -- an idea that I was toying with around the time
"Const" was introduced. That way the optimizer could provide "hints"
rather than explicitly manipulating the AST in certain cases. The
compiler could then look for those hints and generate code accordingly.

For example, in place of the Const node, we might have a "const"
annotation that's applied to the top-level expression.

I think I went with the Const node because it was less work, but I don't
remember the details. I'll try to dig up the appropriate emails if I
haven't lost them.

David said:
> Fixed, I think.  The original code appears to be somewhat 
> inconsistent between files in its uses of spaces/tabs, ...

Yes, they are inconsistent with tabs/spaces. And it sucks. :)

> The thing that struck me as ugly was the idea that there's one
> object (the optimizer) that knows everything about all
> optimization operations.

That's right, but this is no different from the compiler. Conversely, a
visitor pattern would probably work for both the optimizer and the compiler.

Having said that, I couldn't justify making the AST optimizer a huge
departure from the rest of the code base for the sake of design purity.
I'm not even really sure that it's "design purity" when you do things
inconsistently from one component to the next.

Obviously if there's another sufficiently good argument for the visitor
approach, I'm all ears. But again, if we do that I think we should do it
for the compiler as well. I'm not sure how much support such a change
would have.
msg75605 - (view) Author: David Turner (novalis_dt) Date: 2008-11-07 16:25
> FWIW, I see exposing bytecodes as an anti-pattern that locks in a
> particular implementation in a way that makes it hard to change and 
> hard to port to other Python implementations.  The current bound 
> method approach works fine.  Don't really see enough payoff to 
> justify the extra code and maintenance.

Bytecodes aren't exposed at the language level -- just during
compilation.  As far as I know, other implementations don't use any of
the Python compiler mechanics, so there's no portability problem.  If
another compiler did use the Python AST, they could just compile the
Append node kind to a method call with no loss of speed or clarity.  Or
they could just not use this optimization, if we refactor the ast
optimizer as I propose to use the strategy pattern.

As for the payoff, microbenchmarks show a big win.  Also, appending to a
list is a fairly common pattern.  Our Plone-based codebase shows tens of
thousands of instances of append.
msg75607 - (view) Author: David Turner (novalis_dt) Date: 2008-11-07 17:25
> Obviously if there's another sufficiently good argument for the visitor
> approach, I'm all ears. But again, if we do that I think we should do 
> it for the compiler as well. I'm not sure how much support such a 
> change would have.

The argument is that it would be possible to enable or disable
individual optimizations this way.  For the compiler, there's no need
for this, because there's only one thing to do per node type (although I
suppose we could just pass that set of things into the node walker). 

Another argument against is that it would be harder to combine
optimizations when that's relevant.

I don't think it's worth worrying about until there are a dozen or so
AST-level optimizations.
msg224187 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-28 21:05
Is this ever likely to be implemented or is it more likely to be rejected owing to the reasons given in msg75587 ?
History
Date User Action Args
2014-07-28 21:05:31BreamoreBoysetnosy: + BreamoreBoy

messages: + msg224187
versions: + Python 3.5, - Python 3.3
2010-12-22 03:03:15r.david.murraysetnosy: nnorwitz, rhettinger, thomaslee, novalis_dt
type: enhancement
versions: + Python 3.3
2008-11-07 17:25:28novalis_dtsetmessages: + msg75607
2008-11-07 16:25:10novalis_dtsetmessages: + msg75605
2008-11-07 07:34:52thomasleesetnosy: + thomaslee
messages: + msg75598
2008-11-07 00:28:42rhettingersetnosy: + rhettinger
messages: + msg75587
2008-11-06 23:27:59novalis_dtsetfiles: + tlee-ast-optimize-appends-3.diff
messages: + msg75585
2008-11-06 22:29:42novalis_dtsetfiles: + tlee-ast-optimize-appends-2.diff
messages: + msg75584
2008-11-06 07:45:51nnorwitzsetnosy: + nnorwitz
messages: + msg75547
2008-11-05 19:30:33novalis_dtcreate