classification
Title: Adaptive stable mergesort
Type: Stage:
Components: Interpreter Core Versions: Python 2.3
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: tim.peters Nosy List: aimacintyre, anthonybaxter, gvanrossum, jacobs99, mwh, nascheme, skip.montanaro, tim.peters
Priority: normal Keywords: patch

Created on 2002-07-26 15:51 by tim.peters, last changed 2002-08-01 03:11 by tim.peters. This issue is now closed.

Files
File name Uploaded Description Edit
merge.patch tim.peters, 2002-07-26 15:51 adds .mosrt() and .hsort()
merge4.patch tim.peters, 2002-07-30 15:42 Version 4 of the listobject.c patch
timsort.txt tim.peters, 2002-07-31 20:37 Plain text doc file
Messages (26)
msg40650 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-26 15:51
This adds method list.msort([compare]).

Lib/test/sortperf.py is already a sort performance 
test.  To run it on exactly the same data I used, run it 
via

python -O sortperf.py 15 20 1

That will time the current samplesort (even after this 
patch).  After getting stable numbers for that, change 
sortperf's doit() to say L.msort() instead of L.sort(), 
and you'll time the mergesort instead.

CAUTION:  To save time across many runs, sortperf 
saves the random floats it generates, into temp files.  
If those temp files already exist when sortperf starts, 
it reads them up instead of generating new numbers.  
As a result, it's important in the above to pass "1" as 
the last argument the *first* time you run sortperf -- 
that forces the random # generator into the same 
state it was when I used it.

This patch also gives lists a new list.hsort() method, 
which is a weak heapsort I gave up on.  Time it if you 
want to see how bad an excellent sort can get <wink>.
msg40651 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2002-07-26 16:23
Logged In: YES 
user_id=35752

AMD 1.4 Ghz Athon CPU
  L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
  L2 Cache: 256K (64 bytes/line)
Linux 2.4.19-pre10-ac1
gcc 2.95.4

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.01   0.01   0.07   0.01   0.03   0.01
  0.07
16   65536   0.16   0.02   0.02   0.15   0.02   0.07   0.02
  0.17
17  131072   0.37   0.03   0.03   0.39   0.04   0.16   0.04
  0.41
18  262144   0.84   0.07   0.08   0.87   0.10   0.34   0.07
  0.93
19  524288   1.89   0.16   0.16   1.97   0.21   0.70   0.16
  2.08
20 1048576   4.20   0.33   0.34   4.55   0.41   1.45   0.34
  4.61

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.06   0.00   0.01   0.01   0.01   0.03   0.00
  0.01
16   65536   0.14   0.02   0.02   0.02   0.02   0.06   0.02
  0.04
17  131072   0.35   0.04   0.04   0.04   0.04   0.12   0.04
  0.08
18  262144   0.79   0.08   0.08   0.09   0.09   0.27   0.09
  0.16
19  524288   1.79   0.17   0.17   0.18   0.17   0.54   0.17
  0.33
20 1048576   3.96   0.35   0.34   0.34   0.36   1.12   0.34
  0.70
msg40652 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-26 16:30
Logged In: YES 
user_id=31435

Wow!  Thanks, Neil!  That's impressive, even if I say so 
myself <wink>.
msg40653 - (view) Author: Kevin Jacobs (jacobs99) Date: 2002-07-26 16:54
Logged In: YES 
user_id=459565

Intel 1266 MHz Penguin III x2 (Dual processor)
512KB cache
Linux 2.4.19-pre1-ac2
gcc  3.1 20020205

samplesort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.06   0.01   0.02   0.00   0.07
16   65536   0.16   0.02   0.01   0.15   0.01   0.06   0.02   0.17
17  131072   0.37   0.04   0.04   0.35   0.04   0.15   0.03   0.38
18  262144   0.84   0.07   0.08   0.80   0.09   0.31   0.07   0.86
19  524288   1.89   0.16   0.15   1.78   0.19   0.66   0.15   1.92
20 1048576   4.12   0.33   0.31   4.07   0.37   1.34   0.31   
4.22

timsort:
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.01   0.00   0.01   0.01   0.03   0.01   0.01
16   65536   0.17   0.01   0.02   0.01   0.02   0.06   0.02   0.04
17  131072   0.37   0.04   0.03   0.04   0.04   0.13   0.04   0.08
18  262144   0.84   0.07   0.07   0.08   0.08   0.27   0.07   0.16
19  524288   1.89   0.16   0.15   0.15   0.17   0.55   0.15   0.33
20 1048576   4.16   0.32   0.31   0.31   0.32   1.14   0.31   
0.66
msg40654 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-26 17:52
Logged In: YES 
user_id=31435

Numbers from Marc-Andre Lemburg, "AMD Athlon 
1.2GHz/Linux/gcc".

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.07   0.00   0.01   0.09   0.01   0.03   
0.01   0.08
16   65536   0.18   0.02   0.02   0.19   0.03   0.07   
0.02   0.20
17  131072   0.43   0.05   0.04   0.46   0.05   0.18   
0.05   0.48
18  262144   0.99   0.09   0.10   1.04   0.13   0.40   
0.09   1.11
19  524288   2.23   0.19   0.21   2.32   0.24   0.83   
0.20   2.46
20 1048576   4.96   0.40   0.40   5.41   0.47   1.72   
0.40   5.46

samplesort again (run twice by mistake)

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.09   0.01   0.03   
0.00   0.09
16   65536   0.20   0.02   0.01   0.20   0.03   0.07   
0.02   0.20
17  131072   0.46   0.06   0.02   0.45   0.05   0.20   
0.04   0.49
18  262144   0.99   0.09   0.10   1.09   0.11   0.40   
0.12   1.12
19  524288   2.33   0.20   0.20   2.30   0.24   0.83   
0.19   2.47
20 1048576   4.89   0.40   0.41   5.37   0.48   1.71   
0.38   6.22

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.08   0.01   0.01   0.01   0.01   0.03   
0.00   0.02
16   65536   0.17   0.02   0.02   0.02   0.02   0.07   
0.02   0.06
17  131072   0.41   0.05   0.04   0.05   0.04   0.16   
0.04   0.09
18  262144   0.95   0.10   0.10   0.10   0.10   0.33   
0.10   0.20
19  524288   2.17   0.20   0.21   0.20   0.21   0.66   
0.20   0.44
20 1048576   4.85   0.42   0.40   0.41   0.41   1.37   
0.41   0.84
msg40655 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-26 18:54
Logged In: YES 
user_id=31435

Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I-
cache, 256KB L2 cache, Win98SE, MSVC 6

samplesort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.17   0.01   0.01   0.17   0.01   0.05   
0.01   0.11
16   65536   0.24   0.02   0.02   0.25   0.02   0.08   
0.02   0.24
17  131072   0.53   0.05   0.04   0.49   0.05   0.18   
0.04   0.52
18  262144   1.16   0.09   0.09   1.06   0.12   0.37   
0.09   1.14
19  524288   2.53   0.18   0.17   2.30   0.24   0.75   
0.17   2.47
20 1048576   5.48   0.37   0.35   5.17   0.45   1.51   
0.35   5.34

timsort
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.15   0.03   0.02   0.02   0.01   0.04   
0.01   0.02
16   65536   0.23   0.02   0.02   0.02   0.02   0.09   
0.02   0.04
17  131072   0.53   0.04   0.04   0.05   0.04   0.19   
0.04   0.09
18  262144   1.16   0.09   0.09   0.10   0.09   0.38   
0.09   0.19
19  524288   2.54   0.18   0.17   0.18   0.18   0.78   
0.17   0.36
20 1048576   5.50   0.36   0.35   0.36   0.37   1.60   
0.35   0.73
msg40656 - (view) Author: Skip Montanaro (skip.montanaro) * (Python committer) Date: 2002-07-26 19:50
Logged In: YES 
user_id=44345

Pentium III, 450MHz,  256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

L.sort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.30   0.03   0.09   0.03   0.32
16   65536   0.73   0.06   0.05   0.66   0.06   0.20   0.05   0.71
17  131072   1.53   0.11   0.12   1.42   0.13   0.44   0.11   1.51
18  262144   3.28   0.21   0.21   3.09   0.28   0.89   0.21   3.26
19  524288   7.05   0.44   0.42   6.60   0.59   1.81   0.42   7.03
20 1048576  15.30   0.90   0.86  14.10   1.13   3.62   0.86  14.96

L.msort():

 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   32768   0.32   0.02   0.03   0.03   0.02   0.13   0.02   0.05
16   65536   0.70   0.05   0.06   0.05   0.06   0.27   0.07   0.10
17  131072   1.53   0.09   0.11   0.10   0.11   0.59   0.10   0.21
18  262144   3.27   0.22   0.21   0.23   0.21   1.13   0.21   0.43
19  524288   7.10   0.43   0.45   0.44   0.45   2.27   0.43   0.88
20 1048576  15.03   0.86   0.87   0.87   0.89   4.70   0.89   1.74
msg40657 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-26 20:38
Logged In: YES 
user_id=31435

Intrigued by a comment of McIlroy, I tried catenating all 
the .c files in Objects and Modules, into one giant file, and 
sorted that.  msort got a 22% speedup there, suggesting 
there's *some* kind of significant pre-existing lexicographic 
order (and/or reverse order) in C source files that msort is 
able to exploit.

Trying it again on about 1.33 million lines of Python-Dev 
archive (including assorted uuencoded attachmets). msort 
got a 32% speedup.

I'm not sure what to make of that, but we needed some real 
life data here <wink>.
msg40658 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-27 01:24
Logged In: YES 
user_id=31435

I attached timsort.txt, a plain-text detailed description of 
the algorithm.  After I dies, it's the only clue that will remain 
<wink>.
msg40659 - (view) Author: Anthony Baxter (anthonybaxter) Date: 2002-07-27 08:20
Logged In: YES 
user_id=29957

Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

(sort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.29   0.03   0.02   0.29   0.03   0.09   0.02
  0.31
16   65536   0.66   0.05   0.05   0.68   0.05   0.20   0.05
  0.71
17  131072   1.50   0.11   0.11   1.51   0.12   0.47   0.11
  1.60
18  262144   3.25   0.23   0.22   3.37   0.25   1.18   0.22
  3.52
19  524288   6.88   0.45   0.43   7.30   0.51   1.91   0.43
  7.43
20 1048576  14.90   0.92   0.88  15.49   1.05   3.89   0.90
 16.04
 
(timsort)
imperial% ./python -O Lib/test/sortperf.py 15 20 1
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.28   0.02   0.02   0.03   0.02   0.13   0.02
  0.05
16   65536   0.59   0.05   0.05   0.06   0.05   0.26   0.05
  0.11
17  131072   1.33   0.10   0.09   0.11   0.11   0.54   0.10
  0.21
18  262144   2.92   0.22   0.20   0.22   0.21   1.10   0.20
  0.44
19  524288   6.33   0.44   0.42   0.43   0.43   2.21   0.41
  0.90
20 1048576  13.56   0.89   0.85   0.84   0.87   4.51   0.87
  1.82
 
msg40660 - (view) Author: Anthony Baxter (anthonybaxter) Date: 2002-07-27 11:23
Logged In: YES 
user_id=29957

PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.07   0.01   0.03   0.01
  0.08
16   65536   0.18   0.02   0.02   0.17   0.02   0.06   0.01
  0.19
17  131072   0.41   0.04   0.04   0.41   0.04   0.16   0.04
  0.44
18  262144   0.93   0.09   0.08   0.90   0.10   0.33   0.08
  0.97
19  524288   2.04   0.18   0.16   1.98   0.23   0.69   0.17
  2.13
20 1048576   4.49   0.36   0.34   4.52   0.43   1.44   0.33
  4.65

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.00   0.01   0.04   0.00
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.42   0.03   0.04   0.04   0.04   0.14   0.03
  0.08
18  262144   0.95   0.08   0.08   0.09   0.08   0.30   0.07
  0.17
19  524288   2.08   0.17   0.16   0.17   0.17   0.63   0.17
  0.34
20 1048576   4.56   0.33   0.33   0.33   0.35   1.29   0.33
  0.71


PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

(samplesort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.08   0.01   0.01   0.08   0.00   0.02   0.01
  0.08
16   65536   0.18   0.01   0.02   0.18   0.01   0.06   0.02
  0.19
17  131072   0.41   0.04   0.04   0.39   0.04   0.16   0.04
  0.44
18  262144   0.94   0.08   0.08   0.91   0.10   0.33   0.07
  0.95
19  524288   2.05   0.17   0.16   2.07   0.20   0.70   0.16
  2.11
20 1048576   4.50   0.34   0.32   4.30   0.42   1.41   0.32
  4.61

(timsort)
 i    2**i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort
 !sort
15   32768   0.09   0.01   0.00   0.01   0.01   0.04   0.01
  0.01
16   65536   0.18   0.02   0.02   0.02   0.01   0.07   0.02
  0.04
17  131072   0.41   0.04   0.04   0.04   0.03   0.14   0.03
  0.08
18  262144   0.93   0.08   0.07   0.08   0.08   0.31   0.08
  0.16
19  524288   2.07   0.15   0.15   0.16   0.16   0.63   0.16
  0.34
20 1048576   4.54   0.33   0.31   0.32   0.33   1.28   0.32
  0.67

msg40661 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-28 18:08
Logged In: YES 
user_id=31435

Deleting old doc file.
msg40662 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-28 18:09
Logged In: YES 
user_id=31435

Adding new doc file.
msg40663 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-28 18:14
Logged In: YES 
user_id=31435

Adding new patch, merge2.patch.  Most of this is 
semantically neutral compared to the last version -- more 
asserts, better comments, minor code fiddling for clarity, 
got rid of the weak heapsort.

There is one useful change, extracting more info out of the 
pre-merge "find the endpoints" searches.  This helps "in 
theory" most of the time, but probably not enough to 
measure.  In some odd cases it can help a lot, though.  See 
Python-Dev for discussion.  There's no strong reason to 
time this stuff again, if you already did it once (and thanks 
to those who did!).
msg40664 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-28 19:00
Logged In: YES 
user_id=31435

Just van Rossum
400Mhz G4 PowerPC running MacOSX 10.1.5.
original patch
From an email report; I chopped the "n" column and 
removed some whitespace so it's easier to read on SF.

L.sort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32

L.msort()
 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04
msg40665 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-28 19:28
Logged In: YES 
user_id=31435

Dang!  That little optimization introduced a subtle 
assumption that the comparison function is consistent.  We 
can't assume that in Python (user-supplied functions can 
be arbitrarily goofy).  Deleted merge2.patch and added 
merge3.patch to repair that.
msg40666 - (view) Author: Andrew I MacIntyre (aimacintyre) * Date: 2002-07-29 11:53
Logged In: YES 
user_id=250749

The following results are from your original patch (the n
column dropped for better SF display).

System 1:
Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d
Fix 4

gcc 2.8.1 -O2
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.01   0.07   0.01   0.03   0.02   0.08
16   0.18   0.02   0.01   0.18   0.02   0.08   0.01   0.20
17   0.41   0.04   0.04   0.43   0.05   0.18   0.04   0.46
18   0.93   0.09   0.10   1.00   0.10   0.39   0.10   1.05
19   2.08   0.18   0.20   2.34   0.23   0.81   0.20   2.36
20   4.69   0.37   0.40   5.02   0.47   1.68   0.40   5.28

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.06   0.01   0.01   0.01   0.01   0.03   0.01   0.02
16   0.15   0.03   0.01   0.02   0.02   0.06   0.02   0.04
17   0.37   0.04   0.05   0.04   0.05   0.13   0.05   0.10
18   0.88   0.10   0.09   0.10   0.10   0.28   0.10   0.19
19   1.97   0.20   0.18   0.21   0.21   0.58   0.20   0.39
20   4.40   0.41   0.40   0.42   0.40   1.21   0.40   0.81

gcc 2.95.2 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.07   0.01   0.00   0.07   0.01   0.03   0.00   0.08
16   0.17   0.01   0.03   0.17   0.02   0.09   0.02   0.19
17   0.42   0.05   0.04   0.46   0.06   0.18   0.05   0.45
18   0.99   0.09   0.09   1.05   0.12   0.40   0.09   1.05
19   2.09   0.18   0.21   2.18   0.23   0.84   0.20   2.45
20   4.73   0.39   0.41   5.13   0.47   1.70   0.40   5.38

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.10   0.01   0.01   0.01   0.01   0.04   0.01   0.01
16   0.18   0.02   0.01   0.03   0.02   0.07   0.03   0.03
17   0.37   0.06   0.05   0.04   0.05   0.14   0.04   0.09
18   0.91   0.10   0.10   0.10   0.10   0.27   0.09   0.20
19   1.97   0.21   0.21   0.20   0.20   0.59   0.19   0.40
20   4.31   0.44   0.40   0.44   0.40   1.21   0.40   0.82


System 2:
P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
with a 
  patch to re-enable CPU L1 caches (SMP BIOS issue)
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.73   0.06   0.05   0.74   0.07   0.23   0.05   0.77
16   1.60   0.12   0.12   1.66   0.13   0.48   0.12   1.71
17   3.54   0.26   0.24   3.55   0.27   1.05   0.25   3.74
18   7.63   0.52   0.51   7.73   0.58   2.12   0.50   8.05
19  16.38   1.04   1.01  17.03   1.15   4.28   1.01  17.17
20  34.94   2.09   2.02  35.04   2.37   8.62   2.02  36.58

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   0.74   0.05   0.06   0.06   0.06   0.32   0.06   0.12
16   1.64   0.12   0.12   0.12   0.12   0.65   0.12   0.26
17   3.62   0.25   0.25   0.27   0.26   1.32   0.25   0.52
18   7.78   0.51   0.50   0.53   0.52   2.69   0.50   1.06
19  16.76   1.03   1.01   1.09   1.04   5.46   1.01   2.12
20  35.93   2.09   2.02   2.14   2.09  11.05   2.04   4.38


System 3:
486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE
gcc 2.95.3 -O3
samplesort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.62   0.21   0.21   2.61   0.24   0.83   0.21   2.71
16   5.73   0.45   0.44   5.75   0.48   1.71   0.44   5.94
17  12.46   0.90   0.88  12.34   1.00   3.70   0.89  13.00
18  27.15   1.82   1.80  27.12   2.17   7.59   1.80  28.10
19  57.22   3.77   3.68  59.52   4.41  15.40   3.66  59.62
20 126.80   7.96   7.80 127.63   9.58  32.72   7.46 134.45

timsort
 i  *sort  \sort  /sort  3sort  +sort  ~sort  =sort  !sort
15   2.52   0.21   0.20   0.20   0.20   1.05   0.20   0.42
16   5.49   0.45   0.41   0.43   0.44   2.13   0.43   0.90
17  12.15   0.88   0.84   0.85   0.88   4.34   0.88   1.83
18  26.11   1.82   1.74   1.84   1.81   8.70   1.74   3.67
19  56.34   3.67   3.55   3.80   3.67  17.84   3.53   7.48
20 121.95   7.89   7.37   8.24   7.98  39.38   7.44  16.83


NOTES:

System 2 is just starting to swap in the i=20 case.

System 3 starts to swap at i=18; at i=19, process:resident
size is 2:1; at i=20, process:resident size is a bit over 4:1.
msg40667 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2002-07-29 13:12
Logged In: YES 
user_id=6656

On my iBook (600 MHz G3 with 384 megs of RAM, OS X
10.1.5):

L.sort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.19  0.01  0.00  0.20  0.02  0.07  0.01  0.21
16  0.45  0.05  0.04  0.43  0.04  0.15  0.05  0.47
17  1.00  0.09  0.09  1.01  0.09  0.37  0.09  1.08
18  2.16  0.16  0.16  2.26  0.22  0.75  0.18  2.35
19  4.80  0.38  0.36  5.08  0.46  1.45  0.35  5.31
20 10.65  0.79  0.79 11.83  0.89  3.33  0.78 11.88

L.msort():

 i *sort \sort /sort 3sort +sort ~sort =sort !sort
15  0.18  0.02  0.03  0.02  0.03  0.08  0.02  0.04
16  0.43  0.03  0.03  0.04  0.04  0.17  0.04  0.08
17  0.95  0.08  0.09  0.09  0.08  0.34  0.08  0.18
18  2.08  0.18  0.18  0.19  0.18  0.72  0.18  0.37
19  4.59  0.37  0.38  0.39  0.38  1.47  0.36  0.76
20 10.22  0.83  0.76  0.79  0.78  3.04  0.79  1.66

I've run this often enough to believe they're typical
(inc. .msort() beating .sort() on *sort and ~sort by
a small margin).

Looks like an unequivocal win on this box.
msg40668 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-30 15:41
Logged In: YES 
user_id=31435

Deleting old doc file and merge3.patch; adding new doc file.
msg40669 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-30 15:42
Logged In: YES 
user_id=31435

Adding merge4.patch; explanation to follow.
msg40670 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-30 16:14
Logged In: YES 
user_id=31435

In Kevin's company database (see Python-Dev), there were 
8 fields we could sort on.

Two fields were strongly correlated with the order of the 
records as given, and msort() was >6x faster on those.

Two fields were weakly correlated, and msort was a major 
win on those (>25% speedup).

One field had many duplicates, with a highly skewed 
distribution.  msort was better than 2x faster on that.

But the rest (phone#, #employees, address) were 
essentially randomly ordered, and msort was 
systematically a few percent slower on those.  That 
wouldn't have been remarkable, except that the percentage 
slowdown was a few times larger than the percentage by 
which msort did more comparisons than sort().

I eventually figured out the obvious:  the # of records 
wasn't an exact power of 2, and on random data msort then 
systematically arranged for the final merge to be between a 
run with a large power-of-2 size, and a run with the little bit 
left over.  That adds a bunch of compares over perfectly 
balanced merges, plus O(N) pointer copies, just to get that 
little bit in place.

The new merge4.patch repairs that as best as (I think) non-
heroically possible, quickly picking a minimum run length in 
advance that should almost never lead to a "bad" final 
merge when the data is randomly ordered.

In each of Kevin's 3 "problem sorts", msort() does fewer 
compares than sort() now, and the runtime is generally 
within a fraction of a percent.  These all-in-cache cases still 
seem to favor sort(), though, and it appears to be because 
msort() does a lot more data movement (note that 
quicksorts do no more than one swap per compare, and 
often none, while mergesorts do a copy on every 
compare).  The other 5 major-to-killer wins msort got on 
this data remain intact.

The code changes needed were tiny, but the doc file 
changed a lot more.

Note that this change has no effect on arrays with power-of-
2 sizes, so sortperf.py timings shouldn't change (and don't 
on my box).  The code change is solely to compute a good 
minimum run length before the main loop begins, and it 
happens to return the same value as was hard-coded 
before when the array has a power-of-2 size.

More testing on real data would be most welcome!  Kevin's 
data was very helpful to me.
msg40671 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-31 01:12
Logged In: YES 
user_id=31435

New doc file, with an intro at the start and a program at the 
end.  Turns out that merge4.patch actually reversed the 
random-array #-of-comparisons advantage samplesort had 
enjoyed:  it's now timsort that does 1-2% fewer 
comparisons on random arrays of random lengths.

See the end of the file for why samplesort does 50% more 
comparisons on average for random arrays of length two 
<wink>.

Near the end of the new Intro section at the start, I suggest 
a couple experiments people might try on boxes where 
~sort is much slower under timsort.  That remains baffling, 
but the algorithm doesn't *do* much in that case, so 
someone on a box where it flounders could surely figure out 
why.
msg40672 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-31 06:02
Logged In: YES 
user_id=31435

~sort gets more mysterious all the time:  the mystery now 
is why it's *not* much slower everywhere!  Here are the 
exact # of compares ~sort does:

i        n     sort    msort    %ch    lg(n!)
--  ------  -------  -------  -----  --------
15   32768   130484   188720  44.63    444255
16   65536   260019   377634  45.23    954037
17  131072   555035   755476  36.11   2039137
18  262144  1107826  1511174  36.41   4340409
19  524288  2218562  3022584  36.24   9205096
20 1048576  4430616  6045418  36.45  19458756

The last column is the information-theoretic lower bound for 
sorting random arrays of this size (no comparison-based 
algorithm can do better than than on average), showing 
that sort() and msort() are both getting a lot of good out of 
the duplicates.  But sort()'s special case for equal 
elements is extremely effective on ~sort's specific data 
pattern, and msort just isn't going to get close to that (it 
does better than sort() on skewed distributions with lots of 
duplicates, though).

The only thing I can think of that could transform 
what "should be" highly significant slowdowns into highly 
significant speedups on some boxes are catastrophic 
cache effects in samplesort.  But knowing something about 
how both algorithms work <wink>, that's not screaming "oh, 
of course".
msg40673 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-07-31 20:37
Logged In: YES 
user_id=31435

Replaced the doc file.  The new one contains more info 
comparing msort to sort.  There's nothing more I want to do 
here, and it looks like everyone who might time this already 
did.

Assigned to Guido for pronouncement.  I recommend 
replacing list.sort() with this.  The only real downside is the 
potential for requiring 2*N temp bytes; that (and everything 
else <wink>) is discussed in the doc file.

If this is accepted, another issue is whether to *advertise* 
that this sort is stable.  Some people really want that, but 
requiring stability constrains implementations.  Another 
possibility is to give lists two sort methods, one 
guaranteed stable and the other not, where in 2.3 CPython 
both map to this code.

In no case do I want to keep both the samplesort and 
timsort implementations in the core -- one brain-busting 
sort implementation is quite enough.  This one has many 
wonderful properties the samplesort hybrid lacks.
msg40674 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-07-31 20:45
Logged In: YES 
user_id=6380

1. Go for it.

2. Advertise it as an implementation feature.
msg40675 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2002-08-01 03:11
Logged In: YES 
user_id=31435

This is checked in now, so closing this patch.
History
Date User Action Args
2002-07-26 15:51:08tim.peterscreate