Title: Tuple comprehension
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.9
Status: closed Resolution: postponed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, Batuhan Taskaya, Marco Sulla, steven.daprano, terry.reedy
Priority: normal Keywords:

Created on 2020-02-28 17:44 by Marco Sulla, last changed 2020-02-29 01:31 by terry.reedy. This issue is now closed.

Messages (6)
msg362888 - (view) Author: Marco Sulla (Marco Sulla) * Date: 2020-02-28 17:44
I think a tuple comprehension could be very useful.

Currently, the only way to efficiently create a tuple from a comprehension is to create a list comprehension (generator comprehensions are more slow) and convert it with `tuple()`.

A tuple comprehension will do exactly the same thing, but without the creation of the intermediate list.

IMHO a tuple comprehension can be very useful, because:

1. there are many cases in which you create a list with a comprehension, but you'll never change it later. You could simply convert it with `tuple()`, but it will require more time
2. tuples uses less memory than lists
3. tuples can be interned

As syntax, I propose 

(* expr for x in iterable *)

with absolutely no blank character between the character ( and the *, and the same for ).

Well, I know, it's a bit strange syntax... but () are already taken by generator comprehensions. Furthermore, the * remembers a snowflake, and tuples are a sort of "frozenlists".
msg362899 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2020-02-28 18:37
This change needs to be discussed first on Python-ideas, and ideally needs a PEP (and a sponsor). So you should post it on Python-ideas mailing list or Ideas section.
msg362945 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-29 00:03
This was discussed on Python-Ideas:

and on Discuss:

In both cases the consensus was mostly negative: tuple comprehensions aren't very useful, and performance of current solutions is adequate.
msg362950 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2020-02-29 00:34
Regarding performance, on my computer, the overhead of calling tuple() on a list comp ranges from about 30% for tiny sequences down to about 5% for largish sequences.

Tiny sequences are fast either way:

[steve@ando cpython]$ ./python -m timeit "[i for i in (1,2,3)]"
50000 loops, best of 5: 5.26 usec per loop

[steve@ando cpython]$ ./python -m timeit "tuple([i for i in (1,2,3)])"
50000 loops, best of 5: 6.95 usec per loop

and for large sequences the time is dominated by the comprehension, not the call to tuple:

[steve@ando cpython]$ ./python -m timeit "[i for i in range(1000000)]"
1 loop, best of 5: 1.04 sec per loop

[steve@ando cpython]$ ./python -m timeit "tuple([i for i in range(1000000)])"
1 loop, best of 5: 1.1 sec per loop

(As the size of the list increases, the proportion of the time spent in calling tuple() approaches zero.)

So it is true that there is an opportunity to optimize the creation of a tuple. But we should all be aware of the dangerous of premature optimization and wasting our efforts on optimizing something that doesn't matter.

Marco, can you demonstrate an actual real piece of code, not a made-up contrived example, where the overhead of calling tuple is a bottleneck, or even a significant slow-down?

In real code, I would expect the processing inside the comprehension to be significant, which would decrease the proportional cost of calling tuple even more.

[steve@ando cpython]$ ./python -m timeit "[i**3 + 7*i**2 - 45*i + 11 
    for i in range(500) if (i%7 in (2, 3, 5))]"
100 loops, best of 5: 3.02 msec per loop

[steve@ando cpython]$ ./python -m timeit "tuple(
    [i**3 + 7*i**2 - 45*i + 11 
    for i in range(500) if (i%7 in (2, 3, 5))])"
100 loops, best of 5: 3.03 msec per loop

Remember too that timings of Python code on real computers is subject to a significant amount of variability and noise due to all the other processes running at the same time, from the OS down to other applications. In my tests, I found no less than five pairs of measurement where the call to tuple was faster than NOT calling tuple. This is of course impossible, but it demonstrates that the overhead of calling tuple is small enough that it is within the range of random variation in time.

The bottom line is that while this would be a genuine micro-optimization, it is doubtful that it would make a significant difference to performance of programs apart from contrived benchmarks.

(On the other hand, C is so fast overall because everything in C is micro-optimized.)

If adding tuple comprehensions were free of any other cost, I would say "Sure, why not? It can't hurt." but they are not. Inventing new, ugly, fragile syntax is a tax on the programmer's performance and the ability of newcomers to learn the language. Readability matters.

So I am a very strong -1 vote on the proposed syntax, even if it would micro-optimize the creation of a tuple.

Marco, if you still want to argue for this, you will need to

(1) take it back to Python-Ideas, hopefully to get consensus on syntax or at least a couple of options to choose between;

(2) find a core developer willing to sponsor a PEP;

(3) write a PEP;

(4) and get the PEP accepted.

I'm closing this as Pending/Postponed. If you get a PEP accepted, you can re-open it.
msg362957 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-02-29 01:29
-1 also, not worth the cost, so I would not bother with python-ideas.
msg362958 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-02-29 01:31
'pending' is a worthless state because any subsequent post changes back to 'open'.
Date User Action Args
2020-02-29 01:31:12terry.reedysetstatus: open -> closed
type: enhancement
messages: + msg362958

stage: resolved
2020-02-29 01:29:46terry.reedysetstatus: pending -> open
nosy: + terry.reedy
messages: + msg362957

2020-02-29 00:34:37steven.dapranosetstatus: open -> pending
resolution: postponed
messages: + msg362950
2020-02-29 00:03:30steven.dapranosetnosy: + steven.daprano
messages: + msg362945
2020-02-28 18:37:48BTaskayasetnosy: + BTaskaya
messages: + msg362899
2020-02-28 18:37:40BTaskayasetmessages: - msg362898
2020-02-28 18:37:00Batuhan Taskayasetnosy: + Batuhan Taskaya
messages: + msg362898
2020-02-28 17:44:14Marco Sullacreate