Title: speed up comparisons to self for built-in containers
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.7
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: mark.dickinson, rhettinger, serhiy.storchaka, tim.peters, vstinner, wbolster
Priority: normal Keywords:

Created on 2017-07-12 13:15 by wbolster, last changed 2017-07-21 16:54 by wbolster. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 2679 closed wbolster, 2017-07-12 13:29
Messages (13)
msg298213 - (view) Author: wouter bolsterlee (wbolster) * Date: 2017-07-12 13:15
when comparing instances of the built-in container types (dict, list, and others) python assumes that "identity implies equality". this is documented (and assumed) behaviour:

"In enforcing reflexivity of elements, the comparison of collections assumes that for a collection element x, x == x is always true. Based on that assumption, element identity is compared first, and element comparison is performed only for distinct elements."

because of this, various operations such as rich comparison of lists can be sped up by checking for identity first, and only falling back to (possibly) much slower rich comparisons on container elements when two elements are not identical (i.e. id(a) == id(b)).

because identity implies equality, comparing a container instance to itself is guaranteed to be true.

currently, comparing a list to itself takes O(n) time, which can be measured easily:

>>> x = list(range(1000))
>>> %timeit x == x
2.95 µs ± 19.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> y = list(range(100000))
>>> %timeit y == y
293 µs ± 3.35 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

the second example is 100 times as slow.

i will shortly submit a few pull request to make "compare to self" run in O(1) time instead for a few of the built-in containers.
msg298257 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-13 04:58
This seems like it optimizes an irrelevant use case, comparing a container to itself.  

I don't think this is likely to benefit any existing code, so it would be better not to add more code clutter/complexity with another special case code path unless we're pretty sure that special case actually arises in practice.

FWIW, the note about identity-implies-equality is a statement about how containers implement element comparison rather than about how two containers are compared to one another (as implemented in PyObject_RichCompareBool()).  Also, it isn't just an optimization, it is necessary to help us reason about containers (i.e. preserving the invariant: all(x in c for x in c)).
msg298259 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-07-13 05:15
Just noting that every special code path "at the start" doesn't just speed the case it's aiming at, it also _slows_ every case that doesn't pass the new tests.  It takes time to fail the new tests.  So it usually makes things slower overall, unless the new thing being tested for is in fact common enough to more than make up for marginally slowing everything else.

It's hard to believe that "container1 is container2" happens in container comparison often enough that special-casing it wouldn't be an overall loss.  I'm pretty sure it never occurs in any of my code ;-)
msg298286 - (view) Author: wouter bolsterlee (wbolster) * Date: 2017-07-13 14:15
fwiw, "if (PyDict_CheckExact(a) && a == b)" amounts to two pointer comparisons, the overhead of which is too small to measure. (i tried.)
msg298315 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-07-13 21:15
Measuring in isolation (like with, e.g., timeit) isn't really interesting.  Then everything is in L1 cache, branches are 100% predictable (because they take the same branch over & over for the duration of the test), and second-order effects on _other_ code are invisible.  For example, the 100% branch prediction comes at a hidden cost:  finite hardware branch-prediction tables had to kick out info about _other_ branches to make room for the branches in the code being tested.  Likewise other code is kicked out of instruction caches to make room for the new tests, and similarly for whatever data the test uses.  Those limited hardware resources are consumed by the new code whether or not it's run in the context of a micro-benchmark.

I'm not claiming the new special cases would slow code down "a lot", or even significantly, or even measurably.  That's not the point.  The point is that we avoid adding special cases unless there's _clear_ expected actual benefit, because when we multiply hundreds of code paths with "this could theoretically speed things up in some cases!" special tests, slowdowns _do_ become significant.  They all fight each other, and with expected-case code, for limited hardware acceleration gimmicks.

For a very simple example, we don't special-case integer division by 1.  `n // m`, when m=1, takes time proportional to the number of (internal) digits in `n`, despite that we _could_ simply return a pointer to `n` in constant time.

For the same reasons:  it's rare for code to divide by 1, and checking for anything isn't free (even if a micro-benchmark can't demonstrate it).  However, if someone had a way to divide by _any_ odd integer 5x faster than what we do now, _that_ may be worth the extra test.

That's why Raymond and I are looking for a reason to even suspect that special-casing container identity would pay off in _some_ plausibly realistic use case(s).
msg298338 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-14 05:00
First, few comments about the code.

1) The condition "a == b" is enough. No need to test "PyDict_CheckExact(a)" and like.

2) This fast path can be used only for Py_EQ and Py_NE. For other operations it can change the behavior.

This is common approach of implementing operator== on C++. Unless comparing objects is trivial the implementation of operator== almost always starts with comparing pointers. In Python the overhead of calling methods and comparing objects is larger than in C++ and I expect that the relative overhead of one additional pointer comparison is smaller.
msg298340 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-14 05:55
The main drawback of this optimization perhaps is not tiny runtime overhead in common case, but the complication of the code. It adds at least 6 or 8 lines to every rich compare implementation.

On other hand, this optimization already is implemented for integers, strings, bytes and slices.
msg298357 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2017-07-14 15:20
> On other hand, this optimization already is implemented for integers, strings, bytes and slices.

Probably for good reasons, though, that don't necessarily apply to general containers / other objects.

I can see the possible value for strings and bytes, where filtering on a particular value might involve a good number of `s == s` comparisons (especially given string interning). Similarly, I could believe that small integer caching means that `0 == 0` or `1 == 1` comparisons happen often enough to make the 'is' optimisation worth it. I'm a bit bemused by the slice case, though.
msg298360 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-07-14 16:44
Ya, prior to now ;-) there's generally been some cost-benefit thought given to these things.  Strings and ints are immutable and some values of each are uniquely interned, so the case of identity is more than just plausible.  It happens often.

I'd guess that the bytes object inherited the identity check when old string code was edited to create the bytes type.  The case there is dubious:  as I recall, the only bytes object that's interned is the empty bytes object.

The `slice` object is a hoot.  They may never be compared outside of unit tests ;-)  The code itself is baffled by what it's doing:

static PyObject *
slice_richcompare(PyObject *v, PyObject *w, int op)
    if (v == w) {
        /* XXX Do we really need this shortcut?
           There's a unit test for it, but is that fair? */

It doesn't really care about speed:  if v != w, the code goes on to allocate two temporary 3-tuples, copy the slices' data into them, call PyObject_RichCompare() to compare the tuples, then free the temporary 3-tuples.  Slow.  Its advantage is that it's "obviously correct".

A related example:  for lists, there's an early special case for EQ and NE when the lengths of the lists differ (if lists aren't the same length, they can't be equal).  However, tuple comparison deliberately omits that special case:  while tuple comparison is quite common in some programs (e.g., lots of programs sort large lists of tuples, or index large mappings by tuples), I never found real code that compared tuples of different lengths.  So long as that's so, checking for that case is a pure waste of code, time, and HW resources.
msg298806 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-07-21 15:21
Any objections to closing this?
msg298808 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-21 15:53
I don't have any objections to closing this. The net benefit is contradictory, the maintenance cost is not zero. In many performance important cases (dictionary lookup, searching in sequences) the identity already is checked before calling the comparison method.

Maybe it is worth to remove the fast patch from slice_richcompare() or speed up the general case, but this is a different issue.
msg298811 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-21 16:02
I never understood how container comparison works.

>>> nan = float("nan")
>>> [nan] == [nan]
>>> (nan,) == (nan,)
>>> nan == nan
>>> nan is nan

I picked the float NaN because it's one of the weirdest object in Python: it's not equal to itself.

I don't know what is the impact of the proposed change on comparison. Can it break an application? It's unclear to me.


It also recalls me an optimization I proposed on string: begin with comparison on the hash, if the two hashs are already known. At the end, we decided that it wasn't worth it.


Raymond, Tim and Serhiy don't seem to be convince, so I will follow them and agree to reject this optimization :-)
msg298813 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-07-21 16:14
Victor, this part of the docs explains what you're seeing; scroll down to the

In enforcing reflexivity of elements, the comparison of collections assumes that for a collection element x, x == x is always true ...


In passing, note that dicts indexed by strings (well, indexed by anything) _do_ compare hashes first.  In that specific case it's a major win - but in that specific case we also know in advance that "EQ or NE?" is the only question needing an answer.
Date User Action Args
2017-07-21 16:54:09wbolstersetstatus: open -> closed
resolution: rejected
stage: resolved
2017-07-21 16:14:24tim.peterssetmessages: + msg298813
2017-07-21 16:02:02vstinnersetmessages: + msg298811
2017-07-21 15:53:41serhiy.storchakasetmessages: + msg298808
2017-07-21 15:21:27rhettingersetmessages: + msg298806
2017-07-14 16:44:48tim.peterssetmessages: + msg298360
2017-07-14 15:20:59mark.dickinsonsetnosy: + mark.dickinson
messages: + msg298357
2017-07-14 05:56:00serhiy.storchakasetnosy: + vstinner
messages: + msg298340
2017-07-14 05:00:57serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg298338
2017-07-13 21:15:48tim.peterssetmessages: + msg298315
2017-07-13 14:15:21wbolstersetmessages: + msg298286
2017-07-13 05:15:50tim.peterssetnosy: + tim.peters
messages: + msg298259
2017-07-13 04:58:33rhettingersettype: performance
messages: + msg298257
2017-07-13 04:44:59rhettingersetassignee: rhettinger
2017-07-12 13:29:47wbolstersetpull_requests: + pull_request2745
2017-07-12 13:21:42serhiy.storchakasetnosy: + rhettinger
2017-07-12 13:15:31wbolstercreate