Issue 32945: sorted(generator) is slower than sorted(list-comprehension)

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77126

classification

Title:	sorted(generator) is slower than sorted(list-comprehension)
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.8

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	Antony.Lee, rhettinger, scoder, steven.daprano
Priority:	normal	Keywords:

Created on 2018-02-25 08:22 by Antony.Lee, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg312780 - (view)	Author: Antony Lee (Antony.Lee) *	Date: 2018-02-25 08:22
Consider e.g. In [2]: %timeit sorted([i for i in range(100)]) 4.74 µs ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit sorted(i for i in range(100)) 7.05 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [4]: %timeit sorted([i for i in range(1000)]) 47.2 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit sorted(i for i in range(1000)) 78.7 µs ± 288 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [6]: %timeit sorted([i for i in range(10000)]) 582 µs ± 8.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [7]: %timeit sorted(i for i in range(10000)) 807 µs ± 5.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) It appears that sorting a generator is slower than sorting the corresponding list comprehension, by a ~constant factor. Given that the former can trivially be converted into the latter (i.e. `sorted` could just check whether its argument is a generator, and, if so, convert it to a list first), it would seem that sorting the generator should not be slower than sorting a list (except perhaps by a small constant).
msg312783 - (view)	Author: Stefan Behnel (scoder) *	Date: 2018-02-25 08:47
sorted() does convert its input to a list first, and only then sorts it. It calls PySequence_List() for that, which in turn uses list_extend(), which then applies the obvious optimisation of copying input lists (and tuples) directly. What you are seeing here is probably mostly this difference: $ python3.7 -m timeit 'list(i for i in range(100))' 50000 loops, best of 5: 5.95 usec per loop $ python3.7 -m timeit '[i for i in range(100)]' 100000 loops, best of 5: 3.26 usec per loop
msg312791 - (view)	Author: Antony Lee (Antony.Lee) *	Date: 2018-02-25 09:30
Feel free to close the issue if that's not the forum for this discussion, but I'm still baffled by what's happening. In the example that you give, the first case needs to look up the `list` global and that callable (which happens to be the `list` type) needs to figure out what to do with generator passed in. Indeed, we can compare In [22]: dis.dis(compile("[i for i in [1, 2]]", "<string>", "single")) 1 0 LOAD_CONST 0 (<code object <listcomp> at 0x7f2d3237ec00, file "<string>", line 1>) 2 LOAD_CONST 1 ('<listcomp>') 4 MAKE_FUNCTION 0 6 LOAD_CONST 5 ((1, 2)) 8 GET_ITER 10 CALL_FUNCTION 1 12 PRINT_EXPR 14 LOAD_CONST 4 (None) 16 RETURN_VALUE In [23]: dis.dis(compile("list(i for i in [1, 2])", "<string>", "single")) 1 0 LOAD_NAME 0 (list) 2 LOAD_CONST 0 (<code object <genexpr> at 0x7f2d32392150, file "<string>", line 1>) 4 LOAD_CONST 1 ('<genexpr>') 6 MAKE_FUNCTION 0 8 LOAD_CONST 5 ((1, 2)) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 PRINT_EXPR 18 LOAD_CONST 4 (None) 20 RETURN_VALUE Note how the latter has an extra function call (to `list`). In the example I gave, however: In [24]: dis.dis(compile("sorted([i for i in [1, 2]])", "<string>", "single")) 1 0 LOAD_NAME 0 (sorted) 2 LOAD_CONST 0 (<code object <listcomp> at 0x7f2d3231eb70, file "<string>", line 1>) 4 LOAD_CONST 1 ('<listcomp>') 6 MAKE_FUNCTION 0 8 LOAD_CONST 5 ((1, 2)) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 PRINT_EXPR 18 LOAD_CONST 4 (None) 20 RETURN_VALUE In [25]: dis.dis(compile("sorted(i for i in [1, 2])", "<string>", "single")) 1 0 LOAD_NAME 0 (sorted) 2 LOAD_CONST 0 (<code object <genexpr> at 0x7f2d32328930, file "<string>", line 1>) 4 LOAD_CONST 1 ('<genexpr>') 6 MAKE_FUNCTION 0 8 LOAD_CONST 5 ((1, 2)) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 PRINT_EXPR 18 LOAD_CONST 4 (None) 20 RETURN_VALUE so both cases are much more similar -- superficially, at least.
msg312793 - (view)	Author: Stefan Behnel (scoder) *	Date: 2018-02-25 09:54
The constant function call overhead doesn't make a big difference: $ /opt/python3.7-opt/bin/python3 -m timeit 'list(i for i in range(1000))' 5000 loops, best of 5: 55 usec per loop $ /opt/python3.7-opt/bin/python3 -m timeit '[i for i in range(1000)]' 10000 loops, best of 5: 30.7 usec per loop The difference is that comprehensions are generally more efficient than generators, simply because they are more specialised. When a generator is created, it does not know whether it will be passed into list() to quickly unpack it into a list, or into some complex machinery that just requests one value per year, or only one value at all and then throws it away. I searched a bit, but couldn't find a ticket about the performance difference above, although I'm sure there must be one. So I'll leave this open for now, assuming that there might still be something to improve here.
msg312795 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-02-25 10:09
I think the difficulty here is that your perspective is backwards. It isn't that sorting a generator is slower, it is that sorting a list is faster, because there is more information available with a list and so the interpreter can take a short-cut. With a generator or iterator, the interpreter doesn't know how many items will be in the finished collection, and so it has to build the list in stages as needed, growing it when it runs out of room, and possibly shrinking it if it grows too big. This takes time. But with a list or other sequence with a known length, the interpreter can allocate the right number of items up front, and avoid growing or shrinking the new list. I believe that this is the time saving you are seeing. So I don't think this is a bug, and I don't think there's any room to optimize the generator comprehension case. Unless somebody who knows more about the interpreter internals than I do speaks up to say there is a way to optimize this case, I think there's nothing that can be done.
msg312798 - (view)	Author: Antony Lee (Antony.Lee) *	Date: 2018-02-25 10:30
> But with a list or other sequence with a known length, the interpreter can allocate the right number of items up front, and avoid growing or shrinking the new list. I believe that this is the time saving you are seeing. But certainly when the list comprehension is executed the interpreter also needs to pay that cost? (It cannot know a priori that `[x for x in range(100)]` will have 100 elements, as `range` may have been shadowed at that point.) Yes, the price is not paid when running `sorted` itself, but it should be paid when creating the list from the comprehension?
msg312801 - (view)	Author: Stefan Behnel (scoder) *	Date: 2018-02-25 11:18
> as `range` may have been shadowed at that point No, range() has in fact already been executed at that point. And it returned an iterable that knows its length (search for "LengthHint" in the CPython sources).
msg312826 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-02-25 17:30
[Antony Lee] > so both cases are much more similar -- superficially, at least. Try disassembling the inner code object as well -- that is where the work gets done and the list comprehension can take advantage of the LIST_APPEND opcode (rather than passing data to list_extend through an iterator which has more overhead). [Stefan Behnel] > The difference is that comprehensions are generally more efficient than generators, simply because they are more specialised. Yes, that is most succinct description of why would expect a difference. [Steven D'Aprano] > So I don't think this is a bug, and I don't think there's any room to optimize the generator comprehension case. I concur. [Antony Lee] > Feel free to close the issue if that's not the forum for this discussion, but I'm still baffled by what's happening. Yes, this discussion is more suited to a StackOverflow entry where people commonly ask about why Python behaves as it does. The bug tracker is more suited to known regressions or provable optimizations. (One forum is for "I'm baffled" and the other is for "I have an improvement"). Marking this a closed. If some demonstrable optimization is found, feel free to reopen.
msg312863 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2018-02-25 21:28
FYI, here is the disassembly of the inner code objects. It shows where the difference in speed arises: >>> from dis import dis >>> def f(): lc = [i for i in [1, 2]] ge = (i for i in [1, 2]) return lc, ge >>> for obj in f.__code__.co_consts: if type(obj) == type(f.__code__): print(obj) dis(obj) <code object <listcomp> at 0x113d881e0, file "<pyshell#26>", line 2> 2 0 BUILD_LIST 0 2 LOAD_FAST 0 (.0) >> 4 FOR_ITER 8 (to 14) 6 STORE_FAST 1 (i) 8 LOAD_FAST 1 (i) 10 LIST_APPEND 2 <-- Append directly 12 JUMP_ABSOLUTE 4 >> 14 RETURN_VALUE <code object <genexpr> at 0x1035afe40, file "<pyshell#26>", line 3> 3 0 LOAD_FAST 0 (.0) >> 2 FOR_ITER 10 (to 14) 4 STORE_FAST 1 (i) 6 LOAD_FAST 1 (i) 8 YIELD_VALUE <-- Pass data through iterator 10 POP_TOP <-- Dispose of None from send() 12 JUMP_ABSOLUTE 2 >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE The list comprehension builds the list directly with the high-speed specialized, LIST_APPEND opcode. The generator runs YIELD_VALUE and POP_TOP but the work isn't done. There needs to be a context switch to list_extend() which then extract the value from the iterator and finally appends it to the list. Executive summary: there is overhead when passing data through the iterator protocol.

History
Date	User	Action	Args
2022-04-11 14:58:58	admin	set	github: 77126
2018-02-25 21:28:09	rhettinger	set	messages: + msg312863
2018-02-25 17:30:35	rhettinger	set	status: open -> closed nosy: + rhettinger messages: + msg312826 resolution: not a bug stage: resolved
2018-02-25 11:18:32	scoder	set	messages: + msg312801
2018-02-25 10:30:27	Antony.Lee	set	messages: + msg312798
2018-02-25 10:09:25	steven.daprano	set	nosy: + steven.daprano messages: + msg312795
2018-02-25 09:54:32	scoder	set	messages: + msg312793
2018-02-25 09:30:03	Antony.Lee	set	messages: + msg312791
2018-02-25 08:47:15	scoder	set	versions: + Python 3.8, - Python 3.6 nosy: + scoder messages: + msg312783 components: + Interpreter Core type: performance
2018-02-25 08:22:50	Antony.Lee	create