This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients anacrolix, eric.araujo, gvanrossum, petri.lehtinen, rhettinger
Date 2012-03-15.18:32:21
SpamBayes Score 1.6653345e-16
Marked as misclassified No
Message-id <1331836343.17.0.937952463673.issue14320@psf.upfronthosting.co.za>
In-reply-to
Content
The part of the patch for PySet_Add() is a reasonable improvement to the C API if it doesn't conflict with Martin's stable ABI effort.

The question of whether to change the Python API requires much more thought and I'll do some research and evaluate it more thoroughly over the next few weeks.  Here are some of the considerations:

* The set API currently has a near zero learning curve.  We want to keep it that way.  I'm teaching classes over the next few weeks and will try out the proposal on my students.

* For collections that are commonplace in other languages, I look to their experience and design for inspiration.  I'll look at was done in Smalltalk, Java, and ObjectiveC (with dynamic languages being a better model than statically compiled languages).  In particular, I look to SETL when evaluating the utility of proposed changes to the set API (a little like looking to Matlab when thinking about designing a matrix API).

* I'm concerned about the intuitiveness of the polarity of the proposed method and will try it out on other programmers to see whether "if s.add(e): ..." gets interpreted as "true if e is already added" or "true if the adding a new item".  The sense of set.add() is the opposite of set.__contains__, so we should be careful about making a API change with an ambiguous or error-prone interpretation.

* As written, the proposal seems to be about efficiency rather than clarity.  I'll run my own timings to see if they make any difference in typical applications of set.add().  In addition, I'll consult the Jython folks to see if it makes a difference in their world (I suspect it won't -- they use native Java objects and the Java JIT handily optimizes away the traditional calling pattern).  Also, I'll consult the PyPy folks to see whether they can provide the optimization automatically rather than via an API change.

* The suggested API also needs to be viewed in the context of what other Python APIs do.  For the most part, the language has an aversion to combining tests and assignments.  For example, Python doesn't do "while (buf = f.read(bufsize)): ..." eventhough that is traditionally supported in statically compiled languages.  There is a precedent with dict.setdefault(); however, that is often regarded as one of the least beautiful parts of the API in Python's basic collection objects.

* I also want to look back a previous discussions on this topic.  The set API had a slow and careful evolution starting with a PEP, being exposed as a pure python module, and being coded in C as a builtin type.       The API was built by Alex Martelli, Guido, Tim Peters, Greg Wilson and myself with substantial input from the community.  None of the designers sought to include this functionality and it wasn't because it hadn't occurred to the them or that they were unaware of typical use cases.  In addition, having set.add() return a boolean was discussed and rejected on python-dev (I've forgotten whether it was last year or the year before).  Some care should be taken before dismissing the judgment of the designers who've previously spent time thinking this out.

* Lastly, we need to look at code examples to see whether they read better or whether clarity is being lost in the name of efficiency.  We should look at both sophisticated examples (i.e. sets are part of multistep logic) and minimal examples (i.e. where the set logic is dominant).  Here is a before-and-after for the minimal case:

    def dedup_before(iterable):
        '''Order preserving elimination of duplicates'''
        seen = set()
        for i in iterable:
            if i not in seen:
                seen.add(i)
                yield i

    def dedup_after(iterable):
        '''Order preserving elimination of duplicates'''
        seen = Set()
        for i in iterable:
            if seen.add(i):
                yield i


As you can see, there is more to API design than just spotting an opportunity to fold two steps into one.
History
Date User Action Args
2012-03-15 18:32:23rhettingersetrecipients: + rhettinger, gvanrossum, eric.araujo, anacrolix, petri.lehtinen
2012-03-15 18:32:23rhettingersetmessageid: <1331836343.17.0.937952463673.issue14320@psf.upfronthosting.co.za>
2012-03-15 18:32:22rhettingerlinkissue14320 messages
2012-03-15 18:32:21rhettingercreate