Optimize the `for y in [x]` idiom in comprehensions #77037

serhiy-storchaka · 2018-02-16T08:41:23Z

BPO	32856
Nosy	@rhettinger, @ncoghlan, @scoder, @benjaminp, @stevendaprano, @serhiy-storchaka, @MojoVampire
PRs	bpo-32856: Optimize the idiom for assignment in comprehensions. #5695 bpo-32856: Optimize the idiom for assignment in comprehensions. #5695 bpo-32856: Optimize the assignment idiom in comprehensions. #16814

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-02-12.10:22:26.022>
created_at = <Date 2018-02-16.08:41:23.170>
labels = ['interpreter-core', '3.9', 'performance']
title = 'Optimize the `for y in [x]` idiom in comprehensions'
updated_at = <Date 2020-02-12.10:22:26.021>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2020-02-12.10:22:26.021>
actor = 'serhiy.storchaka'
assignee = 'none'
closed = True
closed_date = <Date 2020-02-12.10:22:26.022>
closer = 'serhiy.storchaka'
components = ['Interpreter Core']
creation = <Date 2018-02-16.08:41:23.170>
creator = 'serhiy.storchaka'
dependencies = []
files = []
hgrepos = []
issue_num = 32856
keywords = ['patch', 'patch']
message_count = 13.0
messages = ['312228', '312229', '312230', '312246', '312247', '337393', '354753', '354764', '354996', '355237', '356758', '359346', '361874']
nosy_count = 7.0
nosy_names = ['rhettinger', 'ncoghlan', 'scoder', 'benjamin.peterson', 'steven.daprano', 'serhiy.storchaka', 'josh.r']
pr_nums = ['5695', '5695', '16814']
priority = 'low'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue32856'
versions = ['Python 3.9']

serhiy-storchaka · 2018-02-16T08:41:22Z

There were a number of discussions about adding new syntax for temporary variables in comprehensions. The last was started yesterday on Python-Ideas (https://mail.python.org/pipermail/python-ideas/2018-February/048971.html). The problem is that all syntaxes proposed before are ugly. There are common solutions of this problem (calculating common subexpression only once): using internal comprehension or generator, or refactoring the inner expression as a local function where local variables can be used. For example [f(x) + g(f(x)) for x in range(10)] can be rewritten as

    f_samples = (f(x) for x in range(10))
    [y+g(y) for y in f_samples]

or

    def func(x):
        y = f(x)
        return y + g(y)
    [func(x) for x in range(10)]

Stephan Houben suggested other idea (https://mail.python.org/pipermail/python-ideas/2018-February/048971.html): perform an assignment by iterating a one-element list.

[y + g(y) for x in range(10) for y in [f(x)]]

I never seen this idiom before, but seems it is well known for some other developers, and it looks less clumsy than other solutions with current syntax. Its advantage over hypothetical syntax ideas is that it is an existing syntax. Its disadvantage over hypothetical syntax ideas is that iterating a one-element list is slightly slower that a simple assignment.

The proposed PR makes for y in [f(x)] in comprehensions as fast as just an assignment y = f(x). This will make this idiom more preferable for performance reasons. Other existing solutions, iterating an inner generator and calling a local function in a loop, have an overhead.

stevendaprano · 2018-02-16T10:23:20Z

+1

Please also support using a one-element tuple:

for y in (f(x),)

serhiy-storchaka · 2018-02-16T10:43:57Z

It is.

serhiy-storchaka · 2018-02-16T17:31:58Z

Here are some microbenchmarks. But since this code always is a part of complex expression it may be not make sense to talk about its pure speed.

$ ./python -m timeit -s 'a = list(range(1000))' -- '[y for x in a for y in [x]]'
Unpatched:  5000 loops, best of 5: 81.4 usec per loop
Patched:   10000 loops, best of 5: 19.8 usec per loop

For comparison the variant without temporary variable:

$ ./python -m timeit -s 'a = list(range(1000))' -- '[x for x in a]'
20000 loops, best of 5: 15.6 usec per loop

The overhead of using temporary variable is decreased from 66 usec to 4.2 usec (by 16 times).

In more realistic examples the subexpression assigned to temporary variable is slow. Otherwise it would be not worth to use a temporary variable. Therefore the relative speed up of the whole comprehension expression caused by this optimization is much smaller.

1st1 · 2018-02-16T17:37:29Z

I'm still not sure whether we should enable this optimization or not.

I haven't ever seen this pattern used in any Python code I worked with, so I suspect it's quite a rare hack. Giving it a fast-path would give this pattern extra visibility and might encourage people to use it.

And the pattern itself... Well, it's quite ugly and barely readable. IMO it's way better to simply rewrite such comprehension to an equivalent 'for' statement.

So I guess I'm -0 on this change. I suggest to ask Guido on the mailing list if he agrees that this language patten is worth optimizing/promoting.

serhiy-storchaka · 2019-03-07T13:30:30Z

Closed in favor of PEP-572.

serhiy-storchaka · 2019-10-15T21:39:29Z

I just discovered that the assignment operator leaks variables out of comprehensions.

>>> [(j:=i*i)+1/j for i in range(1, 3)]
[2.0, 4.25]
>>> j
4
>>> g = ((j:=i*i*i)+1/j for i in range(1, 3))
>>> list(g)
[2.0, 8.125]
>>> j
8

So it does not supersedes this optimization.

rhettinger · 2019-10-16T00:53:33Z

I just discovered that the assignment operator leaks variables
out of comprehensions.
...
So it does not supersedes this optimization.

That's a real bummer. IIRC, it was discussion of this proposal that motivated the creation of the walrus operator.

ncoghlan · 2019-10-20T12:45:49Z

The benefit offered by the parent local scoping was that it made assignment expressions usable as a straightforward way to implement comprehension-based accumulators where you actually do want access to the final value after the comprehension completes (for example, pulling the example or counter-example out of a call to any() or all()).

The downside is that you need an explicit "del j" after the comprehension to ensure prompt cleanup in those cases where you *don't* need the temporary variable after the comprehension has finished running:

>>> [(j:=i*i)+1/j for i in range(1, 3)]; del j # Clean up temp

However, that's still going to be clearer to most readers than writing:

[j+1/j for i in range(1, 3) for j in [i*i]]

So even with the parent local scoping semantics, PEP-572 is still enough to make Yury's comment above still hold (i.e. the use case is too obscure to justify the extra code needed to optimise it)

MojoVampire · 2019-10-23T16:22:19Z

OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky? Even if observable leakage is considered desirable, it strikes me that use cases for walrus in genexprs and comprehensions likely break up into:

90%: Cases where variable is never used outside genexpr/comprehension (because functional programming constructs shouldn't have side-effects, gosh darn it!)
5%: Cases where variable is used outside genexpr/comprehension and expects leakage
5%: Cases where variable is used outside genexpr/comprehension, but never in a way that actually relies on the value set in the genexpr/comprehension (same name chosen by happenstance)

If the walrus behavior in genexpr/comprehensions were tweaked to say that it only leaks if:

It's running at global scope (unavoidable, since there's no way to tell if it's an intended part of the module's interface)

or

A global or nonlocal statement within the function made it clear the name was considered stateful (again, like running at global scope, there is no way to know for sure if the name will be used somewhere else)

or

At some point in the function, outside the genexpr/comprehension, the value of the walrus-assigned name was read.

Case #3 could be even more narrow if the Python AST optimizer was fancier, potentially something like "if the value was read *after* the genexpr/comprehension, but *before* any following *unconditional* writes to the same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if the next line was "leaked = 1" even if "leaked" were read three lines later, or the only reads from leaked occurred before the genexpr/comprehension), but I don't think the optimizer is up to that; following simple rules similar to those the compiler already follows to identify local names should cover 90% of cases anyway.

Aside from the dict returned by locals, and the possibility of earlier finalizer invocation (which you couldn't rely on outside CPython anyway), there's not much difference in behavior between a leaking and non-leaking walrus when the value is never referred to again, and it seems like the 90% case for cases where unwanted leakage occurs would be covered by this. Sure, if my WAG on use case percentages is correct, 5% of use cases would continue to leak even though they didn't benefit from it, but it seems like optimizing the 90% case would do a lot more good than optimizing what's already a micro-optimization that 99% of Python programmers would never use (and shouldn't really be encouraged, since it would rely on CPython implementation details, and produce uglier code).

I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at least get the performance benefit of looping over a one-element tuple rather than a one-element list), thinking it might reduce the overhead of [y for x in a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x in a for y in (x,)] while reading more prettily, but it turns out you beat me to it with bpo-32925, so good show there! :-)

You should probably rerun your benchmarks though; with bpo-32925 committed (a month after you posted the benchmarks here), the performance discrepancy should be somewhat less (estimate based on local benchmarking says maybe 20% faster with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the proposed optimization than without, but I suspect even optimized, few folks will think to write their comprehensions to take advantage of it, which is why I was suggesting tweaks to the more obvious walrus operator.

serhiy-storchaka · 2019-11-16T17:20:06Z

However, that's still going to be clearer to most readers than writing

It is subjective. To me, j+1/j looks clearer than (j:=i*i)+1/j. In addition, the for-as-assignment idiom is more powerful in context of comprehensions, it allows to set an initial value. In any case I want to have a choice.

OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky?

I think this ship is sailed. The semantic of the walrus operator is complex enough to make it even more complex by adding more special cases. Also, while the function-wide optimization of variables is possible, it much more complex problem than the proposed simple optimization.

You should probably rerun your benchmarks though

$ ./python -m timeit -s 'a = list(range(1000))' -- '[y for x in a for y in [x]]'
Unpatched:  5000 loops, best of 5: 66.8 usec per loop
Patched:   10000 loops, best of 5: 21.5 usec per loop

$ ./python -m timeit -s 'a = list(range(1000))' -- '[x for x in a]'
20000 loops, best of 5: 17.8 usec per loop

bpo-32925 reduce the difference, but it is still large (~12).

serhiy-storchaka · 2020-01-05T14:43:23Z

I want to merge PR 16814 if there are no objections.

serhiy-storchaka · 2020-02-12T10:19:07Z

New changeset 8c579b1 by Serhiy Storchaka in branch 'master':
bpo-32856: Optimize the assignment idiom in comprehensions. (GH-16814)
8c579b1

serhiy-storchaka added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Feb 16, 2018

serhiy-storchaka closed this as completed Mar 7, 2019

serhiy-storchaka added 3.9 only security fixes and removed 3.8 only security fixes labels Oct 15, 2019

serhiy-storchaka reopened this Oct 15, 2019

serhiy-storchaka closed this as completed Feb 12, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the `for y in [x]` idiom in comprehensions #77037

Optimize the `for y in [x]` idiom in comprehensions #77037

serhiy-storchaka commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

stevendaprano commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

1st1 commented Feb 16, 2018

serhiy-storchaka commented Mar 7, 2019

serhiy-storchaka commented Oct 15, 2019

rhettinger commented Oct 16, 2019

ncoghlan commented Oct 20, 2019

MojoVampire mannequin commented Oct 23, 2019

serhiy-storchaka commented Nov 16, 2019

serhiy-storchaka commented Jan 5, 2020

serhiy-storchaka commented Feb 12, 2020

Optimize the for y in [x] idiom in comprehensions #77037

Optimize the for y in [x] idiom in comprehensions #77037

Comments

serhiy-storchaka commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

stevendaprano commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

serhiy-storchaka commented Feb 16, 2018

1st1 commented Feb 16, 2018

serhiy-storchaka commented Mar 7, 2019

serhiy-storchaka commented Oct 15, 2019

rhettinger commented Oct 16, 2019

ncoghlan commented Oct 20, 2019

MojoVampire mannequin commented Oct 23, 2019

serhiy-storchaka commented Nov 16, 2019

serhiy-storchaka commented Jan 5, 2020

serhiy-storchaka commented Feb 12, 2020

Optimize the `for y in [x]` idiom in comprehensions #77037

Optimize the `for y in [x]` idiom in comprehensions #77037