Issue 30907: speed up comparisons to self for built-in containers

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/75090

classification

Title:	speed up comparisons to self for built-in containers
Type:	performance	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	rejected
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	mark.dickinson, rhettinger, serhiy.storchaka, tim.peters, vstinner, wbolster
Priority:	normal	Keywords:

Created on 2017-07-12 13:15 by wbolster, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 2679	closed	wbolster, 2017-07-12 13:29

Messages (13)
msg298213 - (view)	Author: wouter bolsterlee (wbolster) *	Date: 2017-07-12 13:15
when comparing instances of the built-in container types (dict, list, and others) python assumes that "identity implies equality". this is documented (and assumed) behaviour: "In enforcing reflexivity of elements, the comparison of collections assumes that for a collection element x, x == x is always true. Based on that assumption, element identity is compared first, and element comparison is performed only for distinct elements." https://docs.python.org/3/reference/expressions.html#value-comparisons because of this, various operations such as rich comparison of lists can be sped up by checking for identity first, and only falling back to (possibly) much slower rich comparisons on container elements when two elements are not identical (i.e. id(a) == id(b)). because identity implies equality, comparing a container instance to itself is guaranteed to be true. currently, comparing a list to itself takes O(n) time, which can be measured easily: >>> x = list(range(1000)) >>> %timeit x == x 2.95 µs ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> y = list(range(100000)) >>> %timeit y == y 293 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) the second example is 100 times as slow. i will shortly submit a few pull request to make "compare to self" run in O(1) time instead for a few of the built-in containers.
msg298257 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-07-13 04:58
This seems like it optimizes an irrelevant use case, comparing a container to itself. I don't think this is likely to benefit any existing code, so it would be better not to add more code clutter/complexity with another special case code path unless we're pretty sure that special case actually arises in practice. FWIW, the note about identity-implies-equality is a statement about how containers implement element comparison rather than about how two containers are compared to one another (as implemented in PyObject_RichCompareBool()). Also, it isn't just an optimization, it is necessary to help us reason about containers (i.e. preserving the invariant: all(x in c for x in c)).
msg298259 - (view)	Author: Tim Peters (tim.peters) *	Date: 2017-07-13 05:15
Just noting that every special code path "at the start" doesn't just speed the case it's aiming at, it also _slows_ every case that doesn't pass the new tests. It takes time to fail the new tests. So it usually makes things slower overall, unless the new thing being tested for is in fact common enough to more than make up for marginally slowing everything else. It's hard to believe that "container1 is container2" happens in container comparison often enough that special-casing it wouldn't be an overall loss. I'm pretty sure it never occurs in any of my code ;-)
msg298286 - (view)	Author: wouter bolsterlee (wbolster) *	Date: 2017-07-13 14:15
fwiw, "if (PyDict_CheckExact(a) && a == b)" amounts to two pointer comparisons, the overhead of which is too small to measure. (i tried.)
msg298315 - (view)	Author: Tim Peters (tim.peters) *	Date: 2017-07-13 21:15
Measuring in isolation (like with, e.g., timeit) isn't really interesting. Then everything is in L1 cache, branches are 100% predictable (because they take the same branch over & over for the duration of the test), and second-order effects on _other_ code are invisible. For example, the 100% branch prediction comes at a hidden cost: finite hardware branch-prediction tables had to kick out info about _other_ branches to make room for the branches in the code being tested. Likewise other code is kicked out of instruction caches to make room for the new tests, and similarly for whatever data the test uses. Those limited hardware resources are consumed by the new code whether or not it's run in the context of a micro-benchmark. I'm not claiming the new special cases would slow code down "a lot", or even significantly, or even measurably. That's not the point. The point is that we avoid adding special cases unless there's _clear_ expected actual benefit, because when we multiply hundreds of code paths with "this could theoretically speed things up in some cases!" special tests, slowdowns _do_ become significant. They all fight each other, and with expected-case code, for limited hardware acceleration gimmicks. For a very simple example, we don't special-case integer division by 1. `n // m`, when m=1, takes time proportional to the number of (internal) digits in `n`, despite that we _could_ simply return a pointer to `n` in constant time. For the same reasons: it's rare for code to divide by 1, and checking for anything isn't free (even if a micro-benchmark can't demonstrate it). However, if someone had a way to divide by _any_ odd integer 5x faster than what we do now, _that_ may be worth the extra test. That's why Raymond and I are looking for a reason to even suspect that special-casing container identity would pay off in _some_ plausibly realistic use case(s).
msg298338 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-07-14 05:00
First, few comments about the code. 1) The condition "a == b" is enough. No need to test "PyDict_CheckExact(a)" and like. 2) This fast path can be used only for Py_EQ and Py_NE. For other operations it can change the behavior. This is common approach of implementing operator== on C++. Unless comparing objects is trivial the implementation of operator== almost always starts with comparing pointers. In Python the overhead of calling methods and comparing objects is larger than in C++ and I expect that the relative overhead of one additional pointer comparison is smaller.
msg298340 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-07-14 05:55
The main drawback of this optimization perhaps is not tiny runtime overhead in common case, but the complication of the code. It adds at least 6 or 8 lines to every rich compare implementation. On other hand, this optimization already is implemented for integers, strings, bytes and slices.
msg298357 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2017-07-14 15:20
> On other hand, this optimization already is implemented for integers, strings, bytes and slices. Probably for good reasons, though, that don't necessarily apply to general containers / other objects. I can see the possible value for strings and bytes, where filtering on a particular value might involve a good number of `s == s` comparisons (especially given string interning). Similarly, I could believe that small integer caching means that `0 == 0` or `1 == 1` comparisons happen often enough to make the 'is' optimisation worth it. I'm a bit bemused by the slice case, though.
msg298360 - (view)	Author: Tim Peters (tim.peters) *	Date: 2017-07-14 16:44
Ya, prior to now ;-) there's generally been some cost-benefit thought given to these things. Strings and ints are immutable and some values of each are uniquely interned, so the case of identity is more than just plausible. It happens often. I'd guess that the bytes object inherited the identity check when old string code was edited to create the bytes type. The case there is dubious: as I recall, the only bytes object that's interned is the empty bytes object. The `slice` object is a hoot. They may never be compared outside of unit tests ;-) The code itself is baffled by what it's doing: """ static PyObject * slice_richcompare(PyObject v, PyObject w, int op) { ... if (v == w) { /* XXX Do we really need this shortcut? There's a unit test for it, but is that fair? */ ... """ It doesn't really care about speed: if v != w, the code goes on to allocate two temporary 3-tuples, copy the slices' data into them, call PyObject_RichCompare() to compare the tuples, then free the temporary 3-tuples. Slow. Its advantage is that it's "obviously correct". A related example: for lists, there's an early special case for EQ and NE when the lengths of the lists differ (if lists aren't the same length, they can't be equal). However, tuple comparison deliberately omits that special case: while tuple comparison is quite common in some programs (e.g., lots of programs sort large lists of tuples, or index large mappings by tuples), I never found real code that compared tuples of different lengths. So long as that's so, checking for that case is a pure waste of code, time, and HW resources.
msg298806 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2017-07-21 15:21
Any objections to closing this?
msg298808 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2017-07-21 15:53
I don't have any objections to closing this. The net benefit is contradictory, the maintenance cost is not zero. In many performance important cases (dictionary lookup, searching in sequences) the identity already is checked before calling the comparison method. Maybe it is worth to remove the fast patch from slice_richcompare() or speed up the general case, but this is a different issue.
msg298811 - (view)	Author: STINNER Victor (vstinner) *	Date: 2017-07-21 16:02
I never understood how container comparison works. >>> nan = float("nan") >>> [nan] == [nan] True >>> (nan,) == (nan,) True >>> nan == nan False >>> nan is nan True I picked the float NaN because it's one of the weirdest object in Python: it's not equal to itself. I don't know what is the impact of the proposed change on comparison. Can it break an application? It's unclear to me. -- It also recalls me an optimization I proposed on string: begin with comparison on the hash, if the two hashs are already known. At the end, we decided that it wasn't worth it. -- Raymond, Tim and Serhiy don't seem to be convince, so I will follow them and agree to reject this optimization :-)
msg298813 - (view)	Author: Tim Peters (tim.peters) *	Date: 2017-07-21 16:14
Victor, this part of the docs explains what you're seeing; scroll down to the """ In enforcing reflexivity of elements, the comparison of collections assumes that for a collection element x, x == x is always true ... """ part. https://docs.python.org/3/reference/expressions.html#value-comparisons In passing, note that dicts indexed by strings (well, indexed by anything) _do_ compare hashes first. In that specific case it's a major win - but in that specific case we also know in advance that "EQ or NE?" is the only question needing an answer.

History
Date	User	Action	Args
2022-04-11 14:58:48	admin	set	github: 75090
2017-07-21 16:54:09	wbolster	set	status: open -> closed resolution: rejected stage: resolved
2017-07-21 16:14:24	tim.peters	set	messages: + msg298813
2017-07-21 16:02:02	vstinner	set	messages: + msg298811
2017-07-21 15:53:41	serhiy.storchaka	set	messages: + msg298808
2017-07-21 15:21:27	rhettinger	set	messages: + msg298806
2017-07-14 16:44:48	tim.peters	set	messages: + msg298360
2017-07-14 15:20:59	mark.dickinson	set	nosy: + mark.dickinson messages: + msg298357
2017-07-14 05:56:00	serhiy.storchaka	set	nosy: + vstinner messages: + msg298340
2017-07-14 05:00:57	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg298338
2017-07-13 21:15:48	tim.peters	set	messages: + msg298315
2017-07-13 14:15:21	wbolster	set	messages: + msg298286
2017-07-13 05:15:50	tim.peters	set	nosy: + tim.peters messages: + msg298259
2017-07-13 04:58:33	rhettinger	set	type: performance messages: + msg298257
2017-07-13 04:44:59	rhettinger	set	assignee: rhettinger
2017-07-12 13:29:47	wbolster	set	pull_requests: + pull_request2745
2017-07-12 13:21:42	serhiy.storchaka	set	nosy: + rhettinger
2017-07-12 13:15:31	wbolster	create