This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author spiv
Recipients belopolsky, rhettinger, spiv
Date 2010-05-12.13:14:54
SpamBayes Score 0.0042824773
Marked as misclassified No
Message-id <1273670097.33.0.157851393088.issue8685@psf.upfronthosting.co.za>
In-reply-to
Content
Regarding memory, good question... but this patch turns out to be an improvement there too.

This optimisation only applies when len(x) > len(y) * 4.  So the minimum size of the result is a set with 3/4 of the elems of x (and possibly would be a full copy of x anyway).

So if you like this optimisation is simply taking advantage of the fact we're going to be copying almost all of these elements anyway.  We could make it less aggressive, but large sets are tuned to be between 1/2 and 1/3 empty internally anyway, so 1/4 overhead seems reasonable.

Also, because this code immediately makes the result set be about the right size, rather than growing it one element at a time, the memory consumption is actually *better*.  I'll attach a script that demonstrates this; for me it shows that large_set.difference(small_set) [where large_set has 4M elems, small_set has 100] peaks at 50MB memory consumption without my patch, but only 18MB with.  (after discounting the memory required for large_set itself, etc.)
History
Date User Action Args
2010-05-12 13:14:57spivsetrecipients: + spiv, rhettinger, belopolsky
2010-05-12 13:14:57spivsetmessageid: <1273670097.33.0.157851393088.issue8685@psf.upfronthosting.co.za>
2010-05-12 13:14:56spivlinkissue8685 messages
2010-05-12 13:14:54spivcreate