Message 355237 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	benjamin.peterson, brett.cannon, josh.r, ncoghlan, rhettinger, scoder, serhiy.storchaka, steven.daprano
Date	2019-10-23.16:22:19
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1571847740.77.0.617207295851.issue32856@roundup.psfhosted.org>
In-reply-to

Content
OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky? Even if observable leakage is considered desirable, it strikes me that use cases for walrus in genexprs and comprehensions likely break up into: 1. 90%: Cases where variable is never used outside genexpr/comprehension (because functional programming constructs shouldn't have side-effects, gosh darn it!) 2. 5%: Cases where variable is used outside genexpr/comprehension and expects leakage 3. 5%: Cases where variable is used outside genexpr/comprehension, but never in a way that actually relies on the value set in the genexpr/comprehension (same name chosen by happenstance) If the walrus behavior in genexpr/comprehensions were tweaked to say that it only leaks if: 1. It's running at global scope (unavoidable, since there's no way to tell if it's an intended part of the module's interface) or 2. A global or nonlocal statement within the function made it clear the name was considered stateful (again, like running at global scope, there is no way to know for sure if the name will be used somewhere else) or 3. At some point in the function, outside the genexpr/comprehension, the value of the walrus-assigned name was read. Case #3 could be even more narrow if the Python AST optimizer was fancier, potentially something like "if the value was read after the genexpr/comprehension, but before any following unconditional writes to the same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if the next line was "leaked = 1" even if "leaked" were read three lines later, or the only reads from leaked occurred before the genexpr/comprehension), but I don't think the optimizer is up to that; following simple rules similar to those the compiler already follows to identify local names should cover 90% of cases anyway. Aside from the dict returned by locals, and the possibility of earlier finalizer invocation (which you couldn't rely on outside CPython anyway), there's not much difference in behavior between a leaking and non-leaking walrus when the value is never referred to again, and it seems like the 90% case for cases where unwanted leakage occurs would be covered by this. Sure, if my WAG on use case percentages is correct, 5% of use cases would continue to leak even though they didn't benefit from it, but it seems like optimizing the 90% case would do a lot more good than optimizing what's already a micro-optimization that 99% of Python programmers would never use (and shouldn't really be encouraged, since it would rely on CPython implementation details, and produce uglier code). I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at least get the performance benefit of looping over a one-element tuple rather than a one-element list), thinking it might reduce the overhead of [y for x in a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x in a for y in (x,)] while reading more prettily, but it turns out you beat me to it with issue32925, so good show there! :-) You should probably rerun your benchmarks though; with issue32925 committed (a month after you posted the benchmarks here), the performance discrepancy should be somewhat less (estimate based on local benchmarking says maybe 20% faster with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the proposed optimization than without, but I suspect even optimized, few folks will think to write their comprehensions to take advantage of it, which is why I was suggesting tweaks to the more obvious walrus operator.

OOC, rather than optimizing a fairly ugly use case, might another approach be to make walrus less leaky? Even if observable leakage is considered desirable, it strikes me that use cases for walrus in genexprs and comprehensions likely break up into:

1. 90%: Cases where variable is never used outside genexpr/comprehension (because functional programming constructs shouldn't have side-effects, gosh darn it!)
2. 5%: Cases where variable is used outside genexpr/comprehension and expects leakage
3. 5%: Cases where variable is used outside genexpr/comprehension, but never in a way that actually relies on the value set in the genexpr/comprehension (same name chosen by happenstance)

If the walrus behavior in genexpr/comprehensions were tweaked to say that it only leaks if:

1. It's running at global scope (unavoidable, since there's no way to tell if it's an intended part of the module's interface)

or

2. A global or nonlocal statement within the function made it clear the name was considered stateful (again, like running at global scope, there is no way to know for sure if the name will be used somewhere else)

or

3. At some point in the function, outside the genexpr/comprehension, the value of the walrus-assigned name was read.

Case #3 could be even more narrow if the Python AST optimizer was fancier, potentially something like "if the value was read *after* the genexpr/comprehension, but *before* any following *unconditional* writes to the same name" (so [leaked := x for x in it] wouldn't bother to leak "leaked" if the next line was "leaked = 1" even if "leaked" were read three lines later, or the only reads from leaked occurred before the genexpr/comprehension), but I don't think the optimizer is up to that; following simple rules similar to those the compiler already follows to identify local names should cover 90% of cases anyway.

Aside from the dict returned by locals, and the possibility of earlier finalizer invocation (which you couldn't rely on outside CPython anyway), there's not much difference in behavior between a leaking and non-leaking walrus when the value is never referred to again, and it seems like the 90% case for cases where unwanted leakage occurs would be covered by this. Sure, if my WAG on use case percentages is correct, 5% of use cases would continue to leak even though they didn't benefit from it, but it seems like optimizing the 90% case would do a lot more good than optimizing what's already a micro-optimization that 99% of Python programmers would never use (and shouldn't really be encouraged, since it would rely on CPython implementation details, and produce uglier code).

I was also inspired by this to look at replacing BUILD_LIST with BUILD_TUPLE when followed by GET_ITER (so "[y for x in it for y in [derived(x)]]" would at least get the performance benefit of looping over a one-element tuple rather than a one-element list), thinking it might reduce the overhead of [y for x in a for y in [x]] in your unpatched benchmark by making it equivalent to [y for x in a for y in (x,)] while reading more prettily, but it turns out you beat me to it with issue32925, so good show there! :-)

You should probably rerun your benchmarks though; with issue32925 committed (a month after you posted the benchmarks here), the performance discrepancy should be somewhat less (estimate based on local benchmarking says maybe 20% faster with BUILD_LIST being optimized to BUILD_TUPLE). Still much faster with the proposed optimization than without, but I suspect even optimized, few folks will think to write their comprehensions to take advantage of it, which is why I was suggesting tweaks to the more obvious walrus operator.

History
Date	User	Action	Args
2019-10-23 16:22:20	josh.r	set	recipients: + josh.r, brett.cannon, rhettinger, ncoghlan, scoder, benjamin.peterson, steven.daprano, serhiy.storchaka
2019-10-23 16:22:20	josh.r	set	messageid: <1571847740.77.0.617207295851.issue32856@roundup.psfhosted.org>
2019-10-23 16:22:20	josh.r	link	issue32856 messages
2019-10-23 16:22:19	josh.r	create