classification
Title: Optimize list comprehensions
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: arigo, nnorwitz, rhettinger
Priority: normal Keywords: patch

Created on 2004-03-06 13:22 by rhettinger, last changed 2004-03-08 22:48 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
listcomp.diff rhettinger, 2004-03-06 13:22 Py 2.4 patch
Messages (9)
msg45467 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-06 13:22
Save about 35% on the per pass overhead of list
comprehensions.

Adds a new opcode, LIST_APPEND, which is faster than
the current call to CALL_FUNCTION 1 on the bound
method, list.append(), and the subsequent call to
POP_TOP to clear the returned None value.

The resulting disassembled code is suprisingly light
and concise.
msg45468 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-07 07:41
Logged In: YES 
user_id=80475

Applied to:

Python/ceval.c 2.379
Python/compile.c 2.299
Include/opcode.h 2.44
Lib/opcode.py 1.5
msg45469 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-03-08 10:58
Logged In: YES 
user_id=4771

The patch also removes the final DELETE_FAST. Why? I think deleting this old reference to the list is a good idea.

Another faster (but slightly less clear) approach would be to change the behavior of LIST_APPEND so that the '_' variable can be completely avoided. In stack manipulation notation:

LIST_APPEND [list, ignored, value]  ->  [list, ignored]

where 'ignored' is the iterator still on the stack.  Admittedly this is quite an unexpected behavior, so maybe LIST_APPEND should be called LIST_COMPREHEND or somesuch.

An intermediate solution with no change in LIST_APPEND would introduce a DUP_OVER stack manipulation opcode:

DUP_OVER  [a, b]  ->  [a, b, a]

which could replace the LOAD_FAST '_' at the beginning of the loop, getting the list object from over the iterator.
msg45470 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-08 17:31
Logged In: YES 
user_id=80475

The actual checkin kept the final DELETE_FAST to allow all
references to the list to be removable.

Will take a look at the proposed variants.  Off-hand, they
do not have a clean feel to them, but they do save a trip
around the eval loop.
msg45471 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2004-03-08 17:36
Logged In: YES 
user_id=33168

You may have problems with inner list comps:  
[x+y for x in list1 for y in list2]

If you only have a single list comp, and LIST_APPEND didn't
pop the list off the stack, I think you could do:
          0 BUILD_LIST               0
          3 LOAD_FAST                0 (iter)
          6 GET_ITER
    >>    7 FOR_ITER                 7 (to 17)
         10 STORE_FAST               2 (elem)
         13 LIST_APPEND
         14 JUMP_ABSOLUTE            7
    >>   17 RETURN_VALUE
msg45472 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2004-03-08 17:38
Logged In: YES 
user_id=33168

My last comment about "problems" referred to removing the
DELETE_FAST by Armin.
msg45473 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-08 17:51
Logged In: YES 
user_id=80475

Armin, do you have a working patch (which handles nested
list comps) that implements:  LIST_APPEND [list, ignored,
value]  ->  [list, ignored].

If you can get it to work, it would simplify and speedup the
code a bit.
msg45474 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-03-08 20:35
Logged In: YES 
user_id=4771

No, Neal's comments made me realize I didn't think about nested loops. In these, you have several iterators lying around on the stack on top of the list. Definitely not clean.

I guess your patch is the cleanest so far. More subtle optimizations should be left to a bytecode optimizer. (btw I wonder why we don't consider these more.)
msg45475 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-03-08 22:48
Logged In: YES 
user_id=80475

I am currently working on building out the byte code
optimizer.  

Let me know if you would like to participate. 

The three infrastructure components needed for further
buildouts are:
* Check a range to see if it is a basic block
* Allow a tempory NOP code to be eliminated on a final pass
which makes jump target fixups.
* Fixup all of the line number references.

I made the first two but not the last.

For example,
  A basic block containing:
    [BUILD_TUPLE 2 UNPACK_SEQUENCE 2]
  or
    [BUILD_LIST 2 UNPACK_SEQUENCE 2]
  should be directly replaced with
    [ROT_TWO NOP NOP NOP NOP NOP]
  and be compressed on a final pass to
    [ROT_TWO]
  This gives a three fold speedup for: a,b=b,a 

The other candidate transformations are:

  [BUILD_TUPLE 3 UNPACK_SEQUENCE 3] --> [ROT_THREE ROT_TWO]

  [LOAD_CONST 2  BINARY_MULTIPLY] --> [DUP BINARY_ADD]

  [RETURN_VALUE anyLoadInstruction RETURN_vALUE] -->
[RETURN_VALUE]

  [STORE_FAST n LOAD_FAST n] --> [DUP STORE_FAST n]

  [UNARY_NOT  JUMP_IF_FALSE tgt  POP_TOP ... tgt: POP_TOP]
--> [JUMP_IF_TRUE tgt  POP_TOP ... tgt: POP_TOP]

  [UNARY_NOT  JUMP_IF_TRUE tgt  POP_TOP ... tgt: POP_TOP]
--> [JUMP_IF_FALSE tgt  POP_TOP ... tgt: POP_TOP]
History
Date User Action Args
2004-03-06 13:22:45rhettingercreate