This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients koos.zevenhoven, tim.peters
Date 2018-09-05.05:55:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1536126910.29.0.56676864532.issue34561@psf.upfronthosting.co.za>
In-reply-to
Content
A new version of the file models a version of the `powersort` merge ordering too.  It clearly dominates timsort and 2-merge in all cases tried, for this notion of "cost".

Against it, its code is much more complex, and the algorithm is very far from obvious.  The paper "motivates" it but doesn't really explain it in enough detail to make most of it clear (but refers to other papers where pieces of it are developed).

Curious:  unless a run is the last in an array, powersort never merges it immediately upon finding it.  Instead its start and length are used to compute a "power", in turn used to decide whether to merge some previous number of runs.  A dollar to anyone who can figure out what the heck the "power" computation is really doing in less than a full day without reading the paper ;-)  Then two dollars if you can prove that the "never equal!" assert holds.

It would require timing lots of C code on various cases to see whether the possible delay in merging new runs really matters.  It's unclear to me a priori, because it buys something too (less memory copying overall).

In any case, just switching to 2-merge would be easy, and that alone is enough to squash the contrived "bad cases" for the current approach.  `powersort` would too, but may also actually help a bit in ordinary cases.  Or not.
History
Date User Action Args
2018-09-05 05:55:10tim.peterssetrecipients: + tim.peters, koos.zevenhoven
2018-09-05 05:55:10tim.peterssetmessageid: <1536126910.29.0.56676864532.issue34561@psf.upfronthosting.co.za>
2018-09-05 05:55:10tim.peterslinkissue34561 messages
2018-09-05 05:55:09tim.peterscreate