This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Stefan Pochmann
Recipients Stefan Pochmann, rhettinger, tim.peters
Date 2021-10-22.20:27:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1634934455.04.0.877462139535.issue45530@roundup.psfhosted.org>
In-reply-to
Content
> It's not surprising the 2nd positions are rarely equal _given_ that the universe has been reduced to the 100 tuples with the same first element.

Yes, exactly.

> In any case, I don't see the point to this exercise ;-)

Just to illustrate why I disagree with what I responded to there. I'm not convinced that many matches at the first tuple position are really an indication that there'll be many matches at the second position as well, so that one should "admit defeat" and that it's "arrogance" to hope for only few matches at the second position. I'd wager it's rather typical that the matching rate at the second position is significantly lower. If you sort people by last and then first name, I think you'd get the same effect as with my above random example, for the same reason. If you sort a set of 2D coordinates, you'll even *never* get matches at the second position. Even with that SO question's data and its massive duplication (especially at the second position) *and* its correlation between first and second position (since both "sorted(x)" and "x[::2]" are derived from the same x) and its 43% matching rate at the first position, there's still only 27% matching rate at the second position (lower than the 1/3 break-even point).

> BTW, don't overlook that tuple_elem_compare can _only_ be used on the very first elements of the tuples.

Yes, that's why I only talked about PyObject_RichCompareBool there.

> > 1) Sort the list only by first position (comparing with only
> >    one tuple_elem_compare).

> Not sure what that means, exactly. (the "with only one tuple_elem_compare" part;

Just wanted to make clear it wouldn't used two tuple_elem_compare or a PyObject_RichCompareBool.

> > 2) Identify equal-first-position-runs (with tuple_elem_compare)
> > and sort each run independently (comparing only positions 1+).

> What are you aiming at? There was no motivation given here.

Much fewer comparisons, especially PyObject_RichCompareBool comparisons. For example for sorting a million pairs with first/second element chosen from a population of 100,000, I get these numbers of comparisons (per element):

            current  groupsort
------------------------------
== first     18.60      none        (PyObject_RichCompareBool)
 < first     16.19     19.60        (tuple_elem_compare)
== second     2.41      2.36        (PyObject_RichCompareBool)
 < second     2.41      2.36        (PyObject_RichCompareBool)
------------------------------
total slow   23.42      4.72        (PyObject_RichCompareBool)
total fast   16.19     19.60        (tuple_elem_compare)
------------------------------
total        39.61     24.32

(With "current" I mean before your optimizations, which I can't test yet.)

> Will, this issue report is about unsafe_tuple_compare(), so, ya, if the changes you envision aren't limited to that, I'd say you want to open a different issue report and a different PR :-)

Ok, I might try that.
History
Date User Action Args
2021-10-22 20:27:35Stefan Pochmannsetrecipients: + Stefan Pochmann, tim.peters, rhettinger
2021-10-22 20:27:35Stefan Pochmannsetmessageid: <1634934455.04.0.877462139535.issue45530@roundup.psfhosted.org>
2021-10-22 20:27:35Stefan Pochmannlinkissue45530 messages
2021-10-22 20:27:34Stefan Pochmanncreate