Title: Speed up list comprehensions by preallocating the list where possible
Type: performance Stage: needs patch
Components: Interpreter Core Versions: Python 3.3
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Mark.Shannon, alex, brett.cannon, ezio.melotti, haypo, pjenvey, rhettinger, terry.reedy
Priority: low Keywords: patch

Created on 2012-02-25 23:29 by alex, last changed 2012-03-03 23:18 by haypo.

File name Uploaded Description Edit
preallocate.diff alex, 2012-02-25 23:29 review
Messages (6)
msg154288 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2012-02-25 23:29
This adds a new opcode which for certain list comprehensions (ones with no if statements and only a single comprehension), preallocates the list to the appropriate size.

Patch is against 2.7, because it was a bit easier.  On:

def f():
    for i in range(10000):
        [j for j in range(10000)]


Here's the speedup:

alex@alex-gaynor-laptop:/tmp$ # Fresh 2.7 branch
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m6.418s
user	0m6.408s
sys	0m0.004s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.670s
user	0m5.648s
sys	0m0.008s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.688s
user	0m5.672s
sys	0m0.008s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.688s
user	0m5.676s
sys	0m0.004s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.690s
user	0m5.684s
sys	0m0.000s
alex@alex-gaynor-laptop:/tmp$ # With patch
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m6.085s
user	0m6.064s
sys	0m0.008s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.728s
user	0m5.720s
sys	0m0.004s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m5.783s
user	0m5.772s
sys	0m0.004s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m4.730s
user	0m4.716s
sys	0m0.008s
alex@alex-gaynor-laptop:/tmp$ time ~/projects/cpython/python 

real	0m4.691s
user	0m4.680s
sys	0m0.004s
msg154799 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2012-03-02 21:18
I think this proposal should be rejected for three reasons.

1. I believe the idea is faulty in that it can only work if the single source iterable *has* a known length. The compiler cannot, in general, determine that and in practice seldom would be able to. For a comprehension within a function, the source typically is or depends on a passed arg. What happens if you replace 'range(10000)' with 'iter(range(10000))' in your patched version and rerun?

As I remember, Guido has rejected the idea of iterators having length information because in general it is not possible.

2. It adds an opcode for an extremely limited case. In 3.x, there are list, set, dict, and generator expression-comprehensions. Many 2.x uses of list comprehensions are (can be, increasingly will be) replaced by one of the others. In particular, lists long enough for there to be a real time saving tend to be replaced by generator expressions if possible.

3. The relative time saving in this limited case is at best 16% (.9/5.7) in a toy 2.7 example. I suspect it would be less in the more complex 3.x implementation and know it would be less in any realistic example with a real, slower target expression (at minimum try '2*j+1' or 'float(j)') and a slower source producer (such as a file object). And to repeat, a source with millions of items will likely be processed an item at a time without creating an explicit list.
msg154802 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-03-02 21:39
This seems like a reasonable optimization to me.
msg154838 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-03-03 13:45
You should try to port the patch to 3.3 and do some benchmark there.
Having some additional tests to make sure that it works fine in all the cases would be nice too (even if listcomps are already used elsewhere in the code/tests).
msg154842 - (view) Author: Philip Jenvey (pjenvey) * (Python committer) Date: 2012-03-03 17:47
iter(range(10000)) should also see a speedup because range's iter supports __length_hint__
msg154854 - (view) Author: Mark Shannon (Mark.Shannon) * Date: 2012-03-03 21:19
Could you run the benchamrks at
and report the results, for 3.3 (rather than 2.7), please?

Adding a new bytecode because it speeds up one 4 line program does not seem such a good idea.
Date User Action Args
2012-03-03 23:18:54hayposetnosy: + haypo
2012-03-03 21:19:48Mark.Shannonsetnosy: + Mark.Shannon
messages: + msg154854
2012-03-03 17:47:12pjenveysetnosy: + pjenvey
messages: + msg154842
2012-03-03 13:45:19ezio.melottisetversions: + Python 3.3, - Python 3.4
nosy: + ezio.melotti

messages: + msg154838

stage: needs patch
2012-03-02 21:39:03rhettingersetassignee: rhettinger ->
messages: + msg154802
2012-03-02 21:18:25terry.reedysetnosy: + terry.reedy
messages: + msg154799
2012-02-26 21:26:31brett.cannonsetnosy: + brett.cannon
2012-02-26 02:16:28rhettingersetpriority: normal -> low
assignee: rhettinger
type: performance
2012-02-25 23:34:47pitrousetnosy: + rhettinger
2012-02-25 23:29:26alexcreate