Message 105450 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	abacabadabacaba, belopolsky, eric.smith, ezio.melotti, rhettinger
Date	2010-05-10.17:44:35
SpamBayes Score	0.0017728062
Marked as misclassified	No
Message-id	<AANLkTilSTXfXChxQwwd9Ko5mQUwApt5_p_SB3rvVHg3P@mail.gmail.com>
In-reply-to	<1273432523.79.0.0658499297918.issue8425@psf.upfronthosting.co.za>

Content
The problem is apparently due to the fact that small_set -= large_set iterates over the large set removing entries from small_set while more efficient small_set - large_set builds a new set iterating over a small_set and adding items that are not in the large set. Since lookups are O(1), it is more efficient to iterate over a small set while looking up a large set than the other way around. I am attaching a minimalist patch which gives up in-place update for s1 -= s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2. The code can be further optimized by replicating set_difference logic in set_isub instead of relying on fall back behavior, but the big question here with whether it is acceptable to give up preserving set identity in in-place subtract to gain performance.

The problem is apparently due to the fact that small_set -= large_set
iterates over the large set removing entries from small_set while more
efficient small_set - large_set builds a new set iterating over a
small_set and adding items that are not in the large set.  Since
lookups are O(1), it is more efficient to iterate over a small set
while looking up a large set than the other way around.  I am
attaching a minimalist patch which gives up in-place update for s1 -=
s2 if len(s1) < len(s2) and effectively replaces it with s1 = s1 - s2.
 The code can be further optimized by replicating set_difference logic
in set_isub instead of relying on fall back behavior, but the big
question here  with whether it is acceptable to give up preserving set
identity in in-place subtract to gain performance.

Files
File name	Uploaded
issue8425.diff	belopolsky, 2010-05-10.17:44:34

History
Date	User	Action	Args
2010-05-10 17:44:37	belopolsky	set	recipients: + belopolsky, rhettinger, eric.smith, ezio.melotti, abacabadabacaba
2010-05-10 17:44:35	belopolsky	link	issue8425 messages
2010-05-10 17:44:35	belopolsky	create