Message339586
List comprehensions currently create a series of opcodes inside a code object, the first of which is BUILD_LIST with an oparg of 0, effectively creating a zero-length list with a preallocated size of 0.
If you're doing a simple list comprehension on an iterator, e.g.
def foo():
a = iterable
return [x for x in a]
Disassembly of <code object <listcomp> at 0x109db2c40, file "<stdin>", line 3>:
3 0 BUILD_LIST 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (x)
8 LOAD_FAST 1 (x)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
The list comprehension will do a list_resize on the 4, 8, 16, 25, 35, 46, 58, 72, 88th iterations, etc.
This PR preallocates the list created in a list comprehension to the length of the iterator using PyObject_LengthHint().
It uses a new BUILD_LIST_PREALLOC opcode which builds a list with the allocated size of PyObject_LengthHint(co_varnames[oparg]).
[x for x in iterable] compiles to:
Disassembly of <code object <listcomp> at 0x109db2c40, file "<stdin>", line 3>:
3 0 BUILD_LIST_PREALLOC 0
2 LOAD_FAST 0 (.0)
>> 4 FOR_ITER 8 (to 14)
6 STORE_FAST 1 (x)
8 LOAD_FAST 1 (x)
10 LIST_APPEND 2
12 JUMP_ABSOLUTE 4
>> 14 RETURN_VALUE
If the comprehension has ifs, then it will use the existing BUILD_LIST opcode
Testing using a range length of 10000
./python.exe -m timeit "x=list(range(10000)); [y for y in x]"
Gives 392us on the current 3.8 branch
and 372us with this change (about 8-10% faster)
the longer the iterable, the bigger the impact.
This change also catches the issue that a very large iterator, like a range object :
[a for a in range(2**256)]
Would cause the 3.8< interpreter to consume all memory and crash because there is no check against PY_SSIZE_MAX currently.
With this change (assuming there is no if inside the comprehension) is now caught and thrown as an OverflowError:
>>> [a for a in range(2**256)]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
OverflowError: Python int too large to convert to C ssize_t |
|
Date |
User |
Action |
Args |
2019-04-08 01:28:22 | anthony shaw | set | recipients:
+ anthony shaw, ncoghlan |
2019-04-08 01:28:21 | anthony shaw | set | messageid: <1554686901.99.0.0374079193358.issue36551@roundup.psfhosted.org> |
2019-04-08 01:28:21 | anthony shaw | link | issue36551 messages |
2019-04-08 01:28:21 | anthony shaw | create | |
|