Author josh.r
Recipients benjamin.peterson, brett.cannon, josh.r, ncoghlan, rhettinger, scoder, serhiy.storchaka, steven.daprano
Date 2019-10-23.16:22:19
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1571847740.77.0.617207295851.issue32856@roundup.psfhosted.org>
In-reply-to
Content
OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky? Even if observable leakage is considered desirable, it strikes me that use cases for walrus in genexprs and comprehensions likely break up into:

1. 90%: Cases where variable is never used outside genexpr/comprehension (because functional programming constructs shouldn't have side-effects, gosh darn it!)
2. 5%: Cases where variable is used outside genexpr/comprehension and expects leakage
3. 5%: Cases where variable is used outside genexpr/comprehension, but never in a way that actually relies on the value set in the genexpr/comprehension (same name chosen by happenstance)

If the walrus behavior in genexpr/comprehensions were tweaked to say that it only leaks if:

1. It's running at global scope (unavoidable, since there's no way to tell if it's an intended part of the module's interface)

or

2. A global or nonlocal statement within the function made it clear the name was considered stateful (again, like running at global scope, there is no way to know for sure if the name will be used somewhere else)

or

3. At some point in the function, outside the genexpr/comprehension, the value of the walrus-assigned name was read.

Case #3 could be even more narrow if the Python AST optimizer was fancier, potentially something like "if the value was read *after* the genexpr/comprehension, but *before* any following *unconditional* writes to the same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if the next line was "leaked = 1" even if "leaked" were read three lines later, or the only reads from leaked occurred before the genexpr/comprehension), but I don't think the optimizer is up to that; following simple rules similar to those the compiler already follows to identify local names should cover 90% of cases anyway.

Aside from the dict returned by locals, and the possibility of earlier finalizer invocation (which you couldn't rely on outside CPython anyway), there's not much difference in behavior between a leaking and non-leaking walrus when the value is never referred to again, and it seems like the 90% case for cases where unwanted leakage occurs would be covered by this. Sure, if my WAG on use case percentages is correct, 5% of use cases would continue to leak even though they didn't benefit from it, but it seems like optimizing the 90% case would do a lot more good than optimizing what's already a micro-optimization that 99% of Python programmers would never use (and shouldn't really be encouraged, since it would rely on CPython implementation details, and produce uglier code).

I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at least get the performance benefit of looping over a one-element tuple rather than a one-element list), thinking it might reduce the overhead of [y for x in a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x in a for y in (x,)] while reading more prettily, but it turns out you beat me to it with issue32925, so good show there! :-)

You should probably rerun your benchmarks though; with issue32925 committed (a month after you posted the benchmarks here), the performance discrepancy should be somewhat less (estimate based on local benchmarking says maybe 20% faster with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the proposed optimization than without, but I suspect even optimized, few folks will think to write their comprehensions to take advantage of it, which is why I was suggesting tweaks to the more obvious walrus operator.
History
Date User Action Args
2019-10-23 16:22:20josh.rsetrecipients: + josh.r, brett.cannon, rhettinger, ncoghlan, scoder, benjamin.peterson, steven.daprano, serhiy.storchaka
2019-10-23 16:22:20josh.rsetmessageid: <1571847740.77.0.617207295851.issue32856@roundup.psfhosted.org>
2019-10-23 16:22:20josh.rlinkissue32856 messages
2019-10-23 16:22:19josh.rcreate