Message 219437 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	steven.daprano, terry.reedy, thomasahle, tim.peters, vajrasky
Date	2014-05-31.02:53:08
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1401504789.72.0.713917805361.issue21592@psf.upfronthosting.co.za>
In-reply-to

Content
I suggest this needs a measurable goal, like "minimize expected-case time" or "minimize worst-case time". "Use a state of the art implementation" isn't a measurable goal - it's an abstract wish with no measurable merit on its own ;-) Note that I wrote the "median-of-k" code referenced in the first post here (it was in reply to David Eppstein). Even then it was hard to beat sort() + index. It's harder now, and for an important reason: Python's _current_ sort can exploit many kinds of partial order, and can often complete in linear time. This makes picking "typical" input for timing crucial. If you want to skew it to put sort() + index at maximum disadvantage, use only shuffled (random order) pairwise-unequal inputs. But most streams of data do have some kinds of partial order (which is why the newer sort() implementation has been so successful), and "typical" is a much tougher thing to capture than "shuffled".

I suggest this needs a measurable goal, like "minimize expected-case time" or "minimize worst-case time".  "Use a state of the art implementation" isn't a measurable goal - it's an abstract wish with no measurable merit on its own ;-)

Note that I wrote the "median-of-k" code referenced in the first post here (it was in reply to David Eppstein).  Even then it was hard to beat sort() + index.

It's harder now, and for an important reason:  Python's _current_ sort can exploit many kinds of partial order, and can often complete in linear time.

This makes picking "typical" input for timing crucial.  If you want to skew it to put sort() + index at maximum disadvantage, use only shuffled (random order) pairwise-unequal inputs.  But most streams of data do have some kinds of partial order (which is why the newer sort() implementation has been so successful), and "typical" is a much tougher thing to capture than "shuffled".

History
Date	User	Action	Args
2014-05-31 02:53:09	tim.peters	set	recipients: + tim.peters, terry.reedy, steven.daprano, thomasahle, vajrasky
2014-05-31 02:53:09	tim.peters	set	messageid: <1401504789.72.0.713917805361.issue21592@psf.upfronthosting.co.za>
2014-05-31 02:53:09	tim.peters	link	issue21592 messages
2014-05-31 02:53:08	tim.peters	create