This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author spiv
Recipients rhettinger, spiv
Date 2010-05-11.10:13:56
SpamBayes Score 0.0083094
Marked as misclassified No
Message-id <1273572840.63.0.285566688411.issue8685@psf.upfronthosting.co.za>
In-reply-to
Content
Ok, this time test_set* passes :)

Currently if you have large set and small set the code will do len(large) lookups in the small set.  When large is >> than small, it is cheaper to copy large and do len(small) lookups in large.  On my laptop a size difference of 4 times is a clear winner for copy+difference_update over the status quo, even for sets of millions of entries.  For more similarly sized sets (even only factor of 2 size difference) the cost of allocating a large set that is likely to be shrunk significantly is greater than the benefit.  So my patch only switches behaviour for len(x)/4 > len(y).

This patch is complementary to the patch in issue8425, I think.
History
Date User Action Args
2010-05-11 10:14:00spivsetrecipients: + spiv, rhettinger
2010-05-11 10:14:00spivsetmessageid: <1273572840.63.0.285566688411.issue8685@psf.upfronthosting.co.za>
2010-05-11 10:13:59spivlinkissue8685 messages
2010-05-11 10:13:57spivcreate