Message 105494 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	spiv
Recipients	rhettinger, spiv
Date	2010-05-11.10:13:56
SpamBayes Score	0.0083094
Marked as misclassified	No
Message-id	<1273572840.63.0.285566688411.issue8685@psf.upfronthosting.co.za>
In-reply-to

Content
Ok, this time test_set* passes :) Currently if you have large set and small set the code will do len(large) lookups in the small set. When large is >> than small, it is cheaper to copy large and do len(small) lookups in large. On my laptop a size difference of 4 times is a clear winner for copy+difference_update over the status quo, even for sets of millions of entries. For more similarly sized sets (even only factor of 2 size difference) the cost of allocating a large set that is likely to be shrunk significantly is greater than the benefit. So my patch only switches behaviour for len(x)/4 > len(y). This patch is complementary to the patch in issue8425, I think.

Ok, this time test_set* passes :)

Currently if you have large set and small set the code will do len(large) lookups in the small set.  When large is >> than small, it is cheaper to copy large and do len(small) lookups in large.  On my laptop a size difference of 4 times is a clear winner for copy+difference_update over the status quo, even for sets of millions of entries.  For more similarly sized sets (even only factor of 2 size difference) the cost of allocating a large set that is likely to be shrunk significantly is greater than the benefit.  So my patch only switches behaviour for len(x)/4 > len(y).

This patch is complementary to the patch in issue8425, I think.

History
Date	User	Action	Args
2010-05-11 10:14:00	spiv	set	recipients: + spiv, rhettinger
2010-05-11 10:14:00	spiv	set	messageid: <1273572840.63.0.285566688411.issue8685@psf.upfronthosting.co.za>
2010-05-11 10:13:59	spiv	link	issue8685 messages
2010-05-11 10:13:57	spiv	create