classification
Title: Optimize the `for y in [x]` idiom in comprehensions
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, brett.cannon, ncoghlan, scoder, serhiy.storchaka, steven.daprano, yselivanov
Priority: low Keywords: patch, patch

Created on 2018-02-16 08:41 by serhiy.storchaka, last changed 2019-03-07 13:39 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5695 closed serhiy.storchaka, 2018-02-16 08:47
PR 5695 closed serhiy.storchaka, 2018-02-16 08:47
Messages (6)
msg312228 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-16 08:41
There were a number of discussions about adding new syntax for temporary variables in comprehensions. The last was started yesterday on Python-Ideas (https://mail.python.org/pipermail/python-ideas/2018-February/048971.html). The problem is that all syntaxes proposed before are ugly. There are common solutions of this problem (calculating common subexpression only once): using internal comprehension or generator, or refactoring the inner expression as a local function where local variables can be used. For example [f(x) + g(f(x)) for x in range(10)] can be rewritten as

    f_samples = (f(x) for x in range(10))
    [y+g(y) for y in f_samples]

or

    def func(x):
        y = f(x)
        return y + g(y)
    [func(x) for x in range(10)]

Stephan Houben suggested other idea (https://mail.python.org/pipermail/python-ideas/2018-February/048971.html): perform an assignment by iterating a one-element list.

    [y + g(y) for x in range(10) for y in [f(x)]]

I never seen this idiom before, but seems it is well known for some other developers, and it looks less clumsy than other solutions with current syntax. Its advantage over hypothetical syntax ideas is that it is an existing syntax. Its disadvantage over hypothetical syntax ideas is that iterating a one-element list is slightly slower that a simple assignment.

The proposed PR makes `for y in [f(x)]` in comprehensions as fast as just an assignment `y = f(x)`. This will make this idiom more preferable for performance reasons. Other existing solutions, iterating an inner generator and calling a local function in a loop, have an overhead.
msg312229 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2018-02-16 10:23
+1

Please also support using a one-element tuple:

`for y in (f(x),)`
msg312230 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-16 10:43
It is.
msg312246 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-16 17:31
Here are some microbenchmarks. But since this code always is a part of complex expression it may be not make sense to talk about its pure speed.

$ ./python -m timeit -s 'a = list(range(1000))' -- '[y for x in a for y in [x]]'
Unpatched:  5000 loops, best of 5: 81.4 usec per loop
Patched:   10000 loops, best of 5: 19.8 usec per loop

For comparison the variant without temporary variable:

$ ./python -m timeit -s 'a = list(range(1000))' -- '[x for x in a]'
20000 loops, best of 5: 15.6 usec per loop

The overhead of using temporary variable is decreased from 66 usec to 4.2 usec (by 16 times).

In more realistic examples the subexpression assigned to temporary variable is slow. Otherwise it would be not worth to use a temporary variable. Therefore the relative speed up of the whole comprehension expression caused by this optimization is much smaller.
msg312247 - (view) Author: Yury Selivanov (yselivanov) * (Python committer) Date: 2018-02-16 17:37
I'm still not sure whether we should enable this optimization or not.

I haven't ever seen this pattern used in any Python code I worked with, so I suspect it's quite a rare hack.  Giving it a fast-path would give this pattern extra visibility and might encourage people to use it.

And the pattern itself... Well, it's quite ugly and barely readable.  IMO it's way better to simply rewrite such comprehension to an equivalent 'for' statement.

So I guess I'm -0 on this change.  I suggest to ask Guido on the mailing list if he agrees that this language patten is worth optimizing/promoting.
msg337393 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-03-07 13:30
Closed in favor of PEP 572.
History
Date User Action Args
2019-03-07 13:39:11serhiy.storchakasetkeywords: patch, patch
resolution: fixed -> rejected
2019-03-07 13:38:38serhiy.storchakasetkeywords: patch, patch
status: open -> closed
resolution: fixed
stage: patch review -> resolved
2019-03-07 13:30:30serhiy.storchakasetkeywords: patch, patch

messages: + msg337393
2018-02-25 17:40:42serhiy.storchakasetkeywords: patch, patch
priority: normal -> low
2018-02-19 20:52:18scodersetnosy: + scoder
2018-02-16 17:37:29yselivanovsetkeywords: patch, patch

messages: + msg312247
2018-02-16 17:31:57serhiy.storchakasetkeywords: patch, patch

messages: + msg312246
2018-02-16 10:43:57serhiy.storchakasetkeywords: patch, patch

messages: + msg312230
2018-02-16 10:23:20steven.dapranosetkeywords: patch, patch
nosy: + steven.daprano
messages: + msg312229

2018-02-16 08:47:48serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request5486
2018-02-16 08:47:48serhiy.storchakasetkeywords: + patch
stage: (no value)
pull_requests: + pull_request5487
2018-02-16 08:41:23serhiy.storchakacreate