classification
Title: List sorting makes duplicate comparisons
Type: performance Stage: resolved
Components: Versions: Python 3.8
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: dwyde, rhettinger, tim.peters
Priority: normal Keywords: patch

Created on 2018-12-01 03:33 by dwyde, last changed 2018-12-02 17:56 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
sort-fix.diff dwyde, 2018-12-01 03:33 A proposed fix
sort.py dwyde, 2018-12-01 03:34 Test script
sort-fix-2.diff dwyde, 2018-12-01 08:01
Messages (5)
msg330839 - (view) Author: David Wyde (dwyde) Date: 2018-12-01 03:33
Python's Timsort sometimes makes the same comparison twice. This leads to extra compares, which can hurt performance.

Python sorts several length-3 permutations in 4 steps, and the problem accumulates with bigger data. There are ~9,800 duplicate less-than checks when sorting a million randomly ordered numbers, and 24,386 wasted comparisons when sorting all permutations of length 8.

I've attached a patch to fix this issue. Feedback and improvements are appreciated. Speed seems roughly comparable, and should be much improved for expensive comparison functions.

The same problem occurs in the Chromium Browser, and probably other ports of Python's Timsort implementation.
msg330843 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2018-12-01 05:57
Yup, it can do some redundant comparisons; more on that here:

https://mail.python.org/pipermail/python-dev/2018-August/155020.html

I'm not inclined to make this already-difficult code even harder to understand for something that's quite likely not to matter - or even hurt.  For example, if we're sorting a million randomly ordered objects, we can expect about 20 million comparisons.  9,800 is so close to zero compared to that it wouldn't be a measurable difference even if the new code came for free - which it doesn't.

It adds whole new piles of tests/branches, which will almost certainly be mis-predicted on the first iteration of the outer loop because they always fail the "initial" part on iterations after the first.  At best, they're cheap but pure do-nothing overhead on outer-loop iterations after the first.

I also dislike that binarysort grows a pile of undocumented assumptions about the _context_ it's called in.  As is, it can be used to sort any slice meeting the requirements spelled out in the comments.  "Leaking" requirements across function boundaries is fragile.

Note that this part:

    else IFLT(pivot, *p) {
        r = p;
    }
    else {
        l = p + 1;
    }

doesn't work at all the way it "looks like" it works.  It expands to:

    else if ((k = ISLT(pivot, *p)) < 0) goto fail;
    if (k) {
        r = p:
    }
    else {
        l = p + 1;
    }

The

    r = p;

and

    l = p + 1;

lines you added above in your new code are useless, because this final "if (k)" block executes regardless of whether `initial` is true or false.  Indeed, if you hadn't _also_ added new code to set `k` to 0 or 1, the code would be wrong.  But that's impossible to see without expanding the macro.

So if you pursue this, at least rewrite that block as:

    else {
        IFLT(pivot, *p) {
            r = p;
        else {
            l = p + 1;
        }
    }
    
and remove the new "k = 1;" and "k = 0;" mystery lines.

But unless there are real-world cases that show significant speedups, I'm - as I said in the message I linked to above - happy with the tradeoffs already in place.
msg330844 - (view) Author: David Wyde (dwyde) Date: 2018-12-01 08:01
Thanks for the speedy and helpful response.

Keeping complexity down is fair. The wasted if-checks on subsequent iterations are certainly a negative trade-off. I saw that binarysort() is only called in one place, but I understand wanting to keep it generic.

I think that slow comparison functions, especially when repeatedly sorting short lists, are the main use case.

I don't know if that's common in performance-critical code. I've heard of using human choices for comparisons, when fewer decisions could provide a notable speedup. The patched code seems a bit slower in some situations, but is faster in others.

Do you think it's worth posting to python-ideas to see what people's use cases are?
msg330860 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-12-01 20:53
> I've heard of using human choices for comparisons, when fewer decisions could provide a notable speedup.

Memoization is the usual solution when expensive computations are being repeated.  That technique preserves the work that was done and it avoids adding unnecessary complexity to the consumer code.

> Do you think it's worth posting to python-ideas to see 
> what people's use cases are?

No, that would be a waste of time and it wouldn't change the validity of Tim's insights.
msg330862 - (view) Author: David Wyde (dwyde) Date: 2018-12-01 22:44
Okay. Thanks!
History
Date User Action Args
2018-12-02 17:56:27rhettingersetstatus: open -> closed
resolution: wont fix
stage: resolved
2018-12-01 22:44:56dwydesetmessages: + msg330862
2018-12-01 20:53:18rhettingersetnosy: + rhettinger
messages: + msg330860
2018-12-01 08:01:14dwydesetfiles: + sort-fix-2.diff

messages: + msg330844
2018-12-01 05:57:29tim.peterssetmessages: + msg330843
2018-12-01 03:34:54dwydesetfiles: + sort.py
2018-12-01 03:33:53dwydecreate