Message 236603 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	DragonFireCK, alex, cvrebert, python-dev, serhiy.storchaka, tim.peters
Date	2015-02-25.17:09:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1424884176.96.0.541809488226.issue23515@psf.upfronthosting.co.za>
In-reply-to

Content
Since it's impossible to trigger the error on any current machine anyway (no machine has enough memory), increasing the size of the stack would be absurd. If you read the paper, they note that this is what the Java folks first did (they changed this part of timsort in a way that _did_ make it possible to provoke the stack overflow on current hardware). And they got it wrong, not increasing it enough. Without the _intended_ invariant, it's difficult to figure out how much is enough. It's not worth measuring performance. If there are a total of R runs in the input array, a total of R-1 merges will be performed (with or without the change). As explained in listsort.txt, per-merge overheads don't matter at all unless they're outrageous, because there are at most ceiling(len(array)/32) runs. So the total number of merges is a small fraction of the number of array elements (call that `N`), in an N*log(N) process. Adding a few native C integer comparisons per merge (as the change essentially does) is trivial. It may be possible to contrive inputs which run significantly slower (or faster!) with the change, because the order in which runs are merged may change. But then you'd just be chasing patterns of HW and OS layers-of-caching behavior specific to the box the test is running on.

Since it's impossible to trigger the error on any current machine anyway (no machine has enough memory), increasing the size of the stack would be absurd.  If you read the paper, they note that this is what the Java folks first did (they changed this part of timsort in a way that _did_ make it possible to provoke the stack overflow on current hardware).  And they got it wrong, not increasing it enough.  Without the _intended_ invariant, it's difficult to figure out how much is enough.

It's not worth measuring performance.  If there are a total of R runs in the input array, a total of R-1 merges will be performed (with or without the change).  As explained in listsort.txt, per-merge overheads don't matter at all unless they're outrageous, because there are at most ceiling(len(array)/32) runs.  So the total number of merges is a small fraction of the number of array elements (call that `N`), in an N*log(N) process.  Adding a few native C integer comparisons per merge (as the change essentially does) is trivial.

It may be possible to contrive inputs which run significantly slower (or faster!) with the change, because the order in which runs are merged may change.  But then you'd just be chasing patterns of HW and OS layers-of-caching behavior specific to the box the test is running on.

History
Date	User	Action	Args
2015-02-25 17:09:37	tim.peters	set	recipients: + tim.peters, alex, cvrebert, python-dev, serhiy.storchaka, DragonFireCK
2015-02-25 17:09:36	tim.peters	set	messageid: <1424884176.96.0.541809488226.issue23515@psf.upfronthosting.co.za>
2015-02-25 17:09:36	tim.peters	link	issue23515 messages
2015-02-25 17:09:36	tim.peters	create