This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sorted(generator) is slower than sorted(list-comprehension)
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Antony.Lee, rhettinger, scoder, steven.daprano
Priority: normal Keywords:

Created on 2018-02-25 08:22 by Antony.Lee, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (9)
msg312780 - (view) Author: Antony Lee (Antony.Lee) * Date: 2018-02-25 08:22
Consider e.g.

    In [2]: %timeit sorted([i for i in range(100)])
    4.74 µs ± 24.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    In [3]: %timeit sorted(i for i in range(100))
    7.05 µs ± 25.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

    In [4]: %timeit sorted([i for i in range(1000)])
    47.2 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [5]: %timeit sorted(i for i in range(1000))
    78.7 µs ± 288 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    In [6]: %timeit sorted([i for i in range(10000)])
    582 µs ± 8.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    In [7]: %timeit sorted(i for i in range(10000))
    807 µs ± 5.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

It appears that sorting a generator is slower than sorting the corresponding list comprehension, by a ~constant factor.  Given that the former can trivially be converted into the latter (i.e. `sorted` could just check whether its argument is a generator, and, if so, convert it to a list first), it would seem that sorting the generator should *not* be slower than sorting a list (except perhaps by a small constant).
msg312783 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-02-25 08:47
sorted() *does* convert its input to a list first, and only then sorts it. It calls PySequence_List() for that, which in turn uses list_extend(), which then applies the obvious optimisation of copying input lists (and tuples) directly.

What you are seeing here is probably mostly this difference:

$ python3.7 -m timeit 'list(i for i in range(100))'
50000 loops, best of 5: 5.95 usec per loop
$ python3.7 -m timeit '[i for i in range(100)]'
100000 loops, best of 5: 3.26 usec per loop
msg312791 - (view) Author: Antony Lee (Antony.Lee) * Date: 2018-02-25 09:30
Feel free to close the issue if that's not the forum for this discussion, but I'm still baffled by what's happening.

In the example that you give, the first case needs to look up the `list` global and that callable (which happens to be the `list` type) needs to figure out what to do with generator passed in.  Indeed, we can compare

    In [22]: dis.dis(compile("[i for i in [1, 2]]", "<string>", "single"))
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f2d3237ec00, file "<string>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               5 ((1, 2))
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 PRINT_EXPR
             14 LOAD_CONST               4 (None)
             16 RETURN_VALUE

In [23]: dis.dis(compile("list(i for i in [1, 2])", "<string>", "single"))
  1           0 LOAD_NAME                0 (list)
              2 LOAD_CONST               0 (<code object <genexpr> at 0x7f2d32392150, file "<string>", line 1>)
              4 LOAD_CONST               1 ('<genexpr>')
              6 MAKE_FUNCTION            0
              8 LOAD_CONST               5 ((1, 2))
             10 GET_ITER
             12 CALL_FUNCTION            1
             14 CALL_FUNCTION            1
             16 PRINT_EXPR
             18 LOAD_CONST               4 (None)
             20 RETURN_VALUE

Note how the latter has an extra function call (to `list`).

In the example I gave, however:

    In [24]: dis.dis(compile("sorted([i for i in [1, 2]])", "<string>", "single"))
    1           0 LOAD_NAME                0 (sorted)
                2 LOAD_CONST               0 (<code object <listcomp> at 0x7f2d3231eb70, file "<string>", line 1>)
                4 LOAD_CONST               1 ('<listcomp>')
                6 MAKE_FUNCTION            0
                8 LOAD_CONST               5 ((1, 2))
                10 GET_ITER
                12 CALL_FUNCTION            1
                14 CALL_FUNCTION            1
                16 PRINT_EXPR
                18 LOAD_CONST               4 (None)
                20 RETURN_VALUE

    In [25]: dis.dis(compile("sorted(i for i in [1, 2])", "<string>", "single"))
    1           0 LOAD_NAME                0 (sorted)
                2 LOAD_CONST               0 (<code object <genexpr> at 0x7f2d32328930, file "<string>", line 1>)
                4 LOAD_CONST               1 ('<genexpr>')
                6 MAKE_FUNCTION            0
                8 LOAD_CONST               5 ((1, 2))
                10 GET_ITER
                12 CALL_FUNCTION            1
                14 CALL_FUNCTION            1
                16 PRINT_EXPR
                18 LOAD_CONST               4 (None)
                20 RETURN_VALUE

so both cases are much more similar -- superficially, at least.
msg312793 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-02-25 09:54
The constant function call overhead doesn't make a big difference:

$ /opt/python3.7-opt/bin/python3 -m timeit 'list(i for i in range(1000))'
5000 loops, best of 5: 55 usec per loop
$ /opt/python3.7-opt/bin/python3 -m timeit '[i for i in range(1000)]'
10000 loops, best of 5: 30.7 usec per loop

The difference is that comprehensions are generally more efficient than generators, simply because they are more specialised. When a generator is created, it does not know whether it will be passed into list() to quickly unpack it into a list, or into some complex machinery that just requests one value per year, or only one value at all and then throws it away.

I searched a bit, but couldn't find a ticket about the performance difference above, although I'm sure there must be one. So I'll leave this open for now, assuming that there might still be something to improve here.
msg312795 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-02-25 10:09
I think the difficulty here is that your perspective is backwards. It isn't that sorting a generator is *slower*, it is that sorting a list is *faster*, because there is more information available with a list and so the interpreter can take a short-cut.

With a generator or iterator, the interpreter doesn't know how many items will be in the finished collection, and so it has to build the list in stages as needed, growing it when it runs out of room, and possibly shrinking it if it grows too big. This takes time.

But with a list or other sequence with a known length, the interpreter can allocate the right number of items up front, and avoid growing or shrinking the new list. I believe that this is the time saving you are seeing.

So I don't think this is a bug, and I don't think there's any room to optimize the generator comprehension case. Unless somebody who knows more about the interpreter internals than I do speaks up to say there is a way to optimize this case, I think there's nothing that can be done.
msg312798 - (view) Author: Antony Lee (Antony.Lee) * Date: 2018-02-25 10:30
> But with a list or other sequence with a known length, the interpreter can allocate the right number of items up front, and avoid growing or shrinking the new list. I believe that this is the time saving you are seeing.

But certainly when the list comprehension is executed the interpreter also needs to pay that cost?  (It cannot know a priori that `[x for x in range(100)]` will have 100 elements, as `range` may have been shadowed at that point.)

Yes, the price is not paid when running `sorted` itself, but it should be paid when creating the list from the comprehension?
msg312801 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2018-02-25 11:18
> as `range` may have been shadowed at that point

No, range() has in fact already been executed at that point. And it returned an iterable that knows its length (search for "LengthHint" in the CPython sources).
msg312826 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-02-25 17:30
[Antony Lee]
> so both cases are much more similar -- superficially, at least.

Try disassembling the inner code object as well -- that is where the work gets done and the list comprehension can take advantage of the LIST_APPEND opcode (rather than passing data to list_extend through an iterator which has more overhead).

[Stefan Behnel]
> The difference is that comprehensions are generally more efficient than generators, simply because they are more specialised.

Yes, that is most succinct description of why would expect a difference.

[Steven D'Aprano]
> So I don't think this is a bug, and I don't think there's any room to optimize the generator comprehension case.

I concur.

[Antony Lee]
> Feel free to close the issue if that's not the forum for this discussion, but I'm still baffled by what's happening.

Yes, this discussion is more suited to a StackOverflow entry where people commonly ask about why Python behaves as it does.  The bug tracker is more suited to known regressions or provable optimizations.  (One forum is for "I'm baffled" and the other is for "I have an improvement").

Marking this a closed.  If some demonstrable optimization is found, feel free to reopen.
msg312863 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-02-25 21:28
FYI, here is the disassembly of the inner code objects.  It shows where the difference in speed arises:

>>> from dis import dis
>>> def f():
	lc = [i for i in [1, 2]]
	ge = (i for i in [1, 2])
	return lc, ge

>>> for obj in f.__code__.co_consts:
	if type(obj) == type(f.__code__):
		print(obj)
		dis(obj)

		
<code object <listcomp> at 0x113d881e0, file "<pyshell#26>", line 2>
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2           <-- Append directly
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
<code object <genexpr> at 0x1035afe40, file "<pyshell#26>", line 3>
  3           0 LOAD_FAST                0 (.0)
        >>    2 FOR_ITER                10 (to 14)
              4 STORE_FAST               1 (i)
              6 LOAD_FAST                1 (i)
              8 YIELD_VALUE                          <-- Pass data through iterator
             10 POP_TOP                              <-- Dispose of None from send()
             12 JUMP_ABSOLUTE            2
        >>   14 LOAD_CONST               0 (None)
             16 RETURN_VALUE

The list comprehension builds the list directly with the high-speed specialized, LIST_APPEND opcode.

The generator runs YIELD_VALUE and POP_TOP but the work isn't done.  There needs to be a context switch to list_extend() which then extract the value from the iterator and finally appends it to the list.

Executive summary: there is overhead when passing data through the iterator protocol.
History
Date User Action Args
2022-04-11 14:58:58adminsetgithub: 77126
2018-02-25 21:28:09rhettingersetmessages: + msg312863
2018-02-25 17:30:35rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg312826

resolution: not a bug
stage: resolved
2018-02-25 11:18:32scodersetmessages: + msg312801
2018-02-25 10:30:27Antony.Leesetmessages: + msg312798
2018-02-25 10:09:25steven.dapranosetnosy: + steven.daprano
messages: + msg312795
2018-02-25 09:54:32scodersetmessages: + msg312793
2018-02-25 09:30:03Antony.Leesetmessages: + msg312791
2018-02-25 08:47:15scodersetversions: + Python 3.8, - Python 3.6
nosy: + scoder

messages: + msg312783

components: + Interpreter Core
type: performance
2018-02-25 08:22:50Antony.Leecreate