This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author brandtbucher
Recipients brandtbucher
Date 2019-02-23.19:39:17
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1550950757.55.0.405574866511.issue36095@roundup.psfhosted.org>
In-reply-to
Content
Sorting sequences containing NaN values produces an incompletely sorted result. Further, because of the complexity of the timsort, this incomplete sort often silently produces unintuitive, unstable-seeming results that are extremely sensitive to the ordering of the inputs:

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2.0, 3, nan, 2]
>>> sorted(reversed([3, 1, 2, float('nan'), 2.0, 2, 2.0]))
[1, 2.0, 2, 2.0, nan, 2, 3]

The patch I have provided addresses these issues, including for lists containing nested lists/tuples with NaN values. Specifically, it stably sorts NaNs to the end of the list with no changes to the timsort itself (just the element-wise comparison functions):

>>> sorted([3, 1, 2, float('nan'), 2.0, 2, 2.0])
[1, 2, 2.0, 2, 2.0, 3, nan]
>>> sorted([[3], [1], [2], [float('nan')], [2.0], [2], [2.0]])
[[1], [2], [2.0], [2], [2.0], [3], [nan]]

It also includes a new regression test for this behavior.

Some other benefits to this patch:

* These changes generally result in a sorting performance improvement across data types. The largest increases here are for nested lists, since we add a new unsafe_list_compare function. Other speed increases are due to safe_object_compare's delegation to unsafe comparison functions for objects of the same type. Specifically, the speed impact (positive is faster, negative is slower) is between:

    * -3% and +3% (10 elements, no PGO)
    * 0% and +4% (10 elements, PGO)
    * 0% and +9% (1000 elements, no PGO)
    * -1% and +9% (1000 elements, PGO)

* The current weird NaN-sorting behavior is not documented, so this is not a breaking change.

* IEEE754 compliance is maintained. The result is still a stable (arguably, more stable), nondecreasing ordering of the original list.
History
Date User Action Args
2019-02-23 19:39:17brandtbuchersetrecipients: + brandtbucher
2019-02-23 19:39:17brandtbuchersetmessageid: <1550950757.55.0.405574866511.issue36095@roundup.psfhosted.org>
2019-02-23 19:39:17brandtbucherlinkissue36095 messages
2019-02-23 19:39:17brandtbuchercreate