This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author dalke
Recipients dalke, rhettinger
Date 2011-12-23.12:58:12
SpamBayes Score 5.4949933e-12
Marked as misclassified No
Message-id <>
My belief is that the people who use set.intersection with more than two terms are 1) going to pass in a list of sets, and 2) don't care about the specific order.

To check the validity of my belief, I did a Google Code Search to find cases of people using set intersection in Python. I searched for "set\.intersection\(\*" and "\.intersection\(.*\, lang:^python$", among others.

I am sad to report that the most common way to compute set.intersection(*list) is by using reduce, like:

    possible = (set(index[c]) for c in set(otp))
    possible = reduce(lambda a, b: a.intersection(b), possible)

That comes from:
  git:// /yubico/
and similar uses are in:
  git:// /redcap/

As well as in the Rosetta Code example for a simple inverted index, at:

This was also implemented more verbosely in:
    intersected_set = sets[0]
    for i in range(1, len(sets)):
        intersected_set = intersected_set.intersection(sets[i])

        s = set (range (len (data[0])))
        for d in zip(*data):
            s = s.intersection(set(find_outliers(d, zoffset=zoffset)))
        return sorted(s)

In other words, 7 codebases use manual pairwise reduction rather than use the equivalent code in Python. (I have not checked for which are due to backwards compatibility requirements.)

On the other hand, if someone really wants to have a specific intersection order, this shows that it's very easy to write.

I found 4 other code bases where set intersection was used for something other than binary intersection, and used the built-in intersection().

    def get_visible_node_ids(self, filterId):
        if filterId in self.idMask:
            visibleNodeIdSets = [f.get_visible_node_ids(filterId) for f in self._filters]
            return set.intersection(*[v for v in visibleNodeIdSets if v is not None])
        return None
                if threads[ct].intersection(*threads.itervalues()):
                    raise ValueError('All threads failed')
(here, threads' values contain sets)

    header_sets = [set(x) for x in header_list]
    header_intersection = set.intersection(*header_sets)
            to_exclude = set()
            for ii in xrange(0, len(self.sets)):
                if (i & (2**ii)):
                    sets_to_intersect.append(sets_by_power_of_two[i & (2**ii)])
                    to_exclude = to_exclude.union(sets_by_power_of_two[(2**ii)])
            final = set.intersection(*sets_to_intersect) - to_exclude

These all find the intersection of sets (not iterators), and the order of evaluation does not appear like it will affect the result.

I do not know though if there will be a performance advantage in these cases to reordering. I do know that in my code, and any inverted index, there is an advantage.

And I do know that the current CPython implementation has bad worst-case performance.
Date User Action Args
2011-12-23 12:58:14dalkesetrecipients: + dalke, rhettinger
2011-12-23 12:58:14dalkesetmessageid: <>
2011-12-23 12:58:13dalkelinkissue13653 messages
2011-12-23 12:58:12dalkecreate