Message 412982 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	panda1200, rhettinger, serhiy.storchaka
Date	2022-02-10.08:35:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1644482124.77.0.201138473982.issue46705@roundup.psfhosted.org>
In-reply-to

Content
Would not testing len(self.difference(other)) == 0 be more efficient? Making a copy of a set and removing elements one by one may be faster than add elements one by one, because we only need to allocate a single chunk of memory for a set. It depends on relative values of len(self), len(other) and len(set(other)). For example, the following code may be optimal in some cases: tmp = set() it = iter(other) for item in it: # if item in self: ? tmp.add(item) if len(tmp) >= len(self): self = self.difference(tmp, it) if not self: return True self.difference_update(other) return not self else: return False # len(self) > len(set(other)) The disadvantage of such optimizations is that they make the code much more bigger. The current code is small and simple, and good enough in most cases.

Would not testing len(self.difference(other)) == 0 be more efficient? Making a copy of a set and removing elements one by one may be faster than add elements one by one, because we only need to allocate a single chunk of memory for a set.

It depends on relative values of len(self), len(other) and len(set(other)). For example, the following code may be optimal in some cases:

    tmp = set()
    it = iter(other)
    for item in it:
        # if item in self: ?
        tmp.add(item)
        if len(tmp) >= len(self):
            self = self.difference(tmp, it)
            if not self:
                return True
            self.difference_update(other)
            return not self
    else:
        return False  # len(self) > len(set(other))

The disadvantage of such optimizations is that they make the code much more bigger. The current code is small and simple, and good enough in most cases.

History
Date	User	Action	Args
2022-02-10 08:35:24	serhiy.storchaka	set	recipients: + serhiy.storchaka, rhettinger, panda1200
2022-02-10 08:35:24	serhiy.storchaka	set	messageid: <1644482124.77.0.201138473982.issue46705@roundup.psfhosted.org>
2022-02-10 08:35:24	serhiy.storchaka	link	issue46705 messages
2022-02-10 08:35:24	serhiy.storchaka	create