msg337442 - (view) |
Author: Brandt Bucher (brandtbucher) * |
Date: 2019-03-07 21:39 |
Binary operations on collections are, in general, of quadratic complexity. However, we can sometimes operate in-place if we know that we hold the only reference to the object. This allows us to avoid making many intermediate copies when summing many lists (or dicts ;), taking the union of many sets, or working with literals.
The attached PR adds a simple 2-line refcount check which delegates to the corresponding in-place operation for: list_concat, list_repeat, set_and, set_or, set_xor, set_sub, bytearray_concat, and bytearray_repeat.
|
msg337727 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-03-12 08:46 |
This is an interesting idea. But I have two concerns.
1. It is hard to implement refcount-based optimization on Python implementations which do not use reference counting (i.e. PyPy). If the effect of this optimization will be significant, people will become writing a code that depends on it, and this will cause problems on other implementations.
2. Currently list1 + list2 + list3 returns a list which allocates the exact amount of memory needed to contain its content. But with the proposed changes the result list could preallocate more memory. If the result is a long living object, this can cause to wasting of memory.
|
msg337780 - (view) |
Author: Brandt Bucher (brandtbucher) * |
Date: 2019-03-12 17:21 |
Thank for the input, Serhiy. On point (1):
It's a valid concern... but I don't think it's necessarily a good enough reason to withhold a simple, yet *significant* performance increase for a common use case.
This reminds me of past CPython implementation details, such as ordered dictionaries in 3.6. Some users may errantly adopt problematic idioms that aren't strictly portable between implementations, but if you ask me the performance improvement (again, from quadratic to linear) for three common built-in types (I admit I'm being a bit generous to bytearray here ;) is well worth it.
Regarding (2):
I believe that CPython lists only overallocate by something like 1/8, which in my opinion is a fairly conservative value. If the lists are large enough for this small preallocation to be a real memory burden, I would imagine that making the extra copies would be even worse.
And, if you ask me, list overallocation is a feature, not a bug!
|
msg337803 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2019-03-12 20:39 |
I intend to make a python-dev post about this so we can decide whether to move forward or not. For now, here are a few thoughts:
* Something like this was implemented for strings as an effort to mitigate the harm from a common mistake of using chained addition to build-up string rather than using string join. IIRC, there as a decision at that time not to do something similar for other containers.
* FWIW, the non-operator versions of these operations already support passing in multiple arguments and is the preferred way to do it.
* I think this PR would help some code in the wild. But typically, people know that building series of new containers will tend to make copies. It would be surprising if it didn't.
* We really don't want people to rely on this optimization. It is very fragile. If someone ever adds an extra reference, the chained operations become quadratic again.
Right now, I'm -0 on the change.
|
msg337804 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2019-03-12 20:45 |
> Binary operations on collections are, in general, of quadratic complexity.
BTW, this seems like drastic mischaracterization, making it seem like something is wrong with the containers. It is more accurate to say that an uncommon user pattern of repeated copy-and-add operations on containers can give quadratic behavior.
Likewise, it is a mischaracterization to say this PR "fixes" the containers. Instead, it uses what is arguably a hack to recognize potential cases where the copy step can be skipped, even though that is what the user explicitly specified should occur.
Armin, do you remember the outcome of the discussions when you added the string optimization for repeated concatenation. I do remember that we told users not to rely on it and that the str.join() was still the recommended best practice.
|
msg337810 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2019-03-12 21:17 |
I'm at best -0 on this, and on the edge of -1. We already supply mutate-in-place operations for when this is wanted, and, e.g., for lists, it's as dead easy to spell as
L1 += L2
Half the point of the hack for string catenation was that the similar-looking
S1 += S2
did _not_ mutate in place. But that spelling for lists already does.
Beyond that, is it actually safe for set operations? Those need to compare elements, so can call into Python code via custom __eq__ implementations, which in turn can do anything, including boosting the number of ways to reach the original inputs (i.e., increase their refcounts). For that reason I believe it can be visible if, e.g.,
Set1 = Set1 & Set2
mutates the original set bound to Set1. The binding via name `Set1` may be the only way to reach the original set before that statement executes, but the process of computing the RHS _may_ create additional references to the original set object.
Even if it doesn't, when element __eq__s are running other threads may run too, and pick up ever-changing values for the set object `Set1` refers to if it's mutated in place.
None of those were potential issues for hacking string catenation (which never calls back into Python code - and holds the GIL - for the duration).
|
msg337857 - (view) |
Author: Brandt Bucher (brandtbucher) * |
Date: 2019-03-13 16:12 |
I'm sorry, Raymond. I didn't mean to imply that this is a problem that needs fixing. I just saw an opportunity for a cheap, effective performance boost for a potentially expensive operation. Thanks for clarifying.
And on users "expecting" copies: many advanced Python programmers I know still recently thought that "a < b < c" was evaluated as "(a < b) < c". It's nice to be pleasantly surprised with "better" behavior... especially when this pretty solidly falls under the category of "implementation detail", since it's not outwardly visible.
And speaking of visibility:
> Beyond that, is it actually safe for set operations?
I'm not sure how it wouldn't be. None of the sets in your example should be able to be mutated, since the operation you described doesn't hit the new branch at all.
I've also spent parts of yesterday and today trying to craft malicious objects/threads that do what you described, and haven't been able to. I've found main roadblock to doing so is the fact that the named references still aren't mutated, and the first (and only) intermediate copy is invisible to everything except PyNumberAdd. I also haven't been able to get a handle to the left-hand set in a subsequent "+" operation from a contained object. Maybe I misunderstood you, though.
However, both of you are much better engineers than I am, so I'll defer to your judgement here. If the set portion of the PR seems vulnerable, I'm eager to modify or remove it.
|
msg337858 - (view) |
Author: Brandt Bucher (brandtbucher) * |
Date: 2019-03-13 16:16 |
> ...in a subsequent "+" operation...
Sorry, I meant "&|-" here.
|
msg337859 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2019-03-13 16:31 |
Brandt, it's quite possible I have no real idea what this patch is aimed at. I didn't reverse-engineer the code, the title is extremely broad, and there are no Python examples I can find for what it _is_ aimed at. If, e.g., it's impossible that this patch has any effect on what happens for
Set1 = Set1 & Set2
then the concerns I raised may well be irrelevant.
|
msg337862 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-03-13 16:50 |
It could aimed at list1 + list2 + list3. Or foo() + list2, where foo() returns a new list. Or other examples when the left operand is a new object.
But it is not correct to say that it is a problem that currently these operations has quadratic complexity. They have the complexity O(N^2) where N is the number of additions. Unless you have expressions with many thousands of additions, this is not a trouble (actually you can get a crash in the attempt to compile such expression in CPython).
|
msg337864 - (view) |
Author: Brandt Bucher (brandtbucher) * |
Date: 2019-03-13 16:57 |
I apologize - I should have been clearer about what this accomplishes. I'll use list addition as an example, but this works for any binary operation on these containers.
If the left operand has a refcount of exactly one, it will simply mutate in-place rather than making an unnecessary copy. The simplest example of this is with literals:
[0] + list1
This just modifies the literal [0] in-place rather than creating an unnecessary copy. However, the more common use case is adding named lists. Currently, when adding list1 + list2 + list3, two copies (with one reference each) are made:
- The intermediate result of list1 + list2.
- The sum of this intermediate result and list3.
Only the second of these is actually returned - the first is used once and thrown away. The effect of this patch is to only create *at most one* (instead of n-1) copy for any arbitrarily long list summation, as this intermediate result will just mutate in-place for lists 3-n.
|
msg337869 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2019-03-13 18:51 |
Serhiy, if I understand this, it _could_ pay big with even just a few operations. For example,
[0] * 1000000 + [1] + [2] + [3]
As-is, we'll copy the million zeroes three times. WIth the patch, more likely the original vector of a million zeroes would be extended in-place, three times with a single element each time.
But I don't know that people do that in real code. It would be great to find real examples in real could that would benefit.
|
msg337882 - (view) |
Author: Armin Rigo (arigo) * |
Date: 2019-03-13 22:57 |
This patch is based on the following reasoning: if 'a' is a list and the reference count of 'a' is equal to 1, then we can mutate in-place 'a' in a call to 'a->ob_type->tp_as_sequence->list_concat'. Typically that is called from 'PyNumber_Add(a, b)'. The patch is only correct for the case where PyNumber_Add() is called from Python/ceval.c. It is clearly wrong if you consider calls to PyNumber_Add() from random C extension modules. Some extension modules' authors would be very surprised if the following code starts giving nonsense:
PyObject *a = PyList_New();
PyObject *b = PyNumber_Add(a, some_other_list);
/* here, OF COURSE a must still be an empty list and b != a */
By comparison, if you consider the hack that I'm guilty for doing long ago to improve string concatenation, you'll see that it is done entirely inside ceval.c, and not in stringobject.c or unicodeobject.c.
For this reason I consider the whole patch, as written now, as bogus.
|
msg337883 - (view) |
Author: Armin Rigo (arigo) * |
Date: 2019-03-13 23:02 |
...or PySequence_Concat() instead of PyNumber_Add(); same reasoning.
|
msg337884 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2019-03-13 23:25 |
Raymond said: "FWIW, the non-operator versions of these operations already support passing in multiple arguments and is the preferred way to do it."
True for sets with union/intersection/difference (and their inplace equivalents), but not for lists and bytearrays, which don't have an out-of-place version, and the in-place version, extend, only takes a single iterable argument.
Perhaps extend could be take multiple iterables as positional varargs, not just a single iterable, to make the multi-extend case work efficiently?
Not sure how often this actually comes up though; maybe I've just internalized good practices, but I only rarely find myself combining containers where the left-most operand is an unreferenced temporary, or combining many containers at once in a single line (I use itertools.chain wrapped in a constructor, or literals with the unpacking generalization for most of those cases).
|
msg337886 - (view) |
Author: Josh Rosenberg (josh.r) * |
Date: 2019-03-13 23:44 |
I believe Tim Peters's concern with:
Set1 = Set1 & Set2
is misplaced, because he's drawn an analogy with the string concatenation optimization, which *does* handle:
Str1 = Str1 + Str2
because the optimization is in the ceval loop and can check the subsequent instruction for a "simple store", clearing the target's reference if the target and the left operand are the same thing. This change doesn't try to handle this case, so the only time the optimization takes place is when the only reference to the left hand operand is on the stack (not in any named variable at all).
Tim's concern *would* make it unsafe to extend this patch to apply the target clearing optimization, because at global scope, clearing the target would be observable from __hash__/__eq__ (and from other threads); str is safe because it holds the GIL the whole time, and can't invoke arbitrary Python level code that might release it, set doesn't have that guarantee.
|
msg337888 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2019-03-14 00:00 |
Josh, I agree - but do read Armin's response. The string hack was done inside ceval.c, where we have extraordinary knowledge of context. Adding similar hacks inside Python C API functions seems to be a non-starter - they can't magically start mutating arguments based on refcounts alone. Armin's dead-simple example is compelling.
So if this is desirable at all, it would need to get hairier.
|
msg337894 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2019-03-14 05:04 |
Everyone, thanks for chiming in. Will mark this as closed/rejected. For this reasons listed this thread, it isn't a sensible path for us (hairy at best, flat-out wrong at worst). To the extent people ever have quadratic behavior problem, we already have explicit and clean ways to solve it with the existing API.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:12 | admin | set | github: 80410 |
2019-10-11 18:46:05 | arigo | set | nosy:
- arigo
|
2019-10-11 18:37:08 | brandtbucher | set | pull_requests:
+ pull_request16298 |
2019-10-10 19:21:35 | brandtbucher | set | pull_requests:
- pull_request16290 |
2019-10-10 19:16:24 | brandtbucher | set | pull_requests:
+ pull_request16290 |
2019-03-14 05:04:31 | rhettinger | set | status: open -> closed resolution: rejected messages:
+ msg337894
stage: patch review -> resolved |
2019-03-14 00:00:22 | tim.peters | set | messages:
+ msg337888 |
2019-03-13 23:44:52 | josh.r | set | messages:
+ msg337886 |
2019-03-13 23:25:40 | josh.r | set | nosy:
+ josh.r messages:
+ msg337884
|
2019-03-13 23:02:28 | arigo | set | messages:
+ msg337883 |
2019-03-13 22:57:54 | arigo | set | messages:
+ msg337882 |
2019-03-13 18:51:19 | tim.peters | set | messages:
+ msg337869 |
2019-03-13 17:13:16 | brandtbucher | set | title: Linear-time list, set, and bytearray ops. -> Avoid unnecessary copies for list, set, and bytearray ops. |
2019-03-13 16:57:42 | brandtbucher | set | messages:
+ msg337864 |
2019-03-13 16:50:49 | serhiy.storchaka | set | messages:
+ msg337862 |
2019-03-13 16:31:27 | tim.peters | set | messages:
+ msg337859 |
2019-03-13 16:16:24 | brandtbucher | set | messages:
+ msg337858 |
2019-03-13 16:12:26 | brandtbucher | set | messages:
+ msg337857 |
2019-03-12 21:17:04 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg337810
|
2019-03-12 20:45:26 | rhettinger | set | nosy:
+ arigo messages:
+ msg337804
|
2019-03-12 20:39:53 | rhettinger | set | messages:
+ msg337803 |
2019-03-12 17:21:00 | brandtbucher | set | messages:
+ msg337780 |
2019-03-12 08:46:18 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg337727
|
2019-03-12 02:41:42 | pablogsal | set | nosy:
+ pablogsal
|
2019-03-11 23:20:59 | brandtbucher | set | title: Linear-time ops for some mutable collections. -> Linear-time list, set, and bytearray ops. |
2019-03-09 05:42:55 | terry.reedy | set | nosy:
+ rhettinger
|
2019-03-07 21:40:08 | brandtbucher | set | keywords:
+ patch stage: patch review pull_requests:
+ pull_request12216 |
2019-03-07 21:39:51 | brandtbucher | create | |