classification
Title: Documenting set comparisons and operations
Type: Stage:
Components: Documentation Versions: Python 3.0, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: georg.brandl, mark.dickinson, rhettinger, terry.reedy
Priority: low Keywords: patch

Created on 2008-10-09 16:17 by terry.reedy, last changed 2008-11-17 22:55 by rhettinger. This issue is now closed.

Files
File name Uploaded Description Edit
expr.diff rhettinger, 2008-11-16 06:57 Doc patch for the reference manual
Messages (7)
msg74585 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-10-09 16:17
RefMan Expressions Comparisons has a subsection headed
"Comparison of objects of the same type depends on the type"
with entries for numbers, bytes, strings, tuples, lists, mappings, and
most_other (compared by id).  Sets (and dict views) are missing.  While
sets are similar to dicts, they are different because they also have
order comparisons.

A problem in defining comparisons for sets is that the usual definitions
depend on equality comparisons of the members involved being, as usual,
reflexive, symmetric, and transitive.  But float('nan') is irreflexive.
 For integral value i, int(i), float(i) or Fraction(i), and Decimal(i)
are (currently) jointly intransitive.  See
http://bugs.python.org/issue4087 
Even without these issues, users are free to write __eq__ methods
however they want.

So I suggest something like the following:
"If equality among the set members involved is reflexive, symmetric, and
transitive as defined in mathematics, set comparisons have the usual
definitions in terms of set inclusion.  Otherwise, they are undefined."

If dict equality had been defined in terms of equality of the set of
(key,value) pairs, it would have the same problem.  The algorithmic
definition in terms of ordered lists works fine, however.

I also suggest a warning be added at the top of the set section in the
Lib. Ref.  Something like:
"The usual definitions of set operations, given below, depend on
equality testing between the members involved being reflexive,
symmetric, and transitive.  If this is not true, results are undefined."
msg74589 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-10-09 17:18
I don't think this is necessary.  The ordering operators for sets are
already documented to mean subset/superset comparisons.  Will look at it
a bit more and possibly add a parenthetical note reminding people that
superset/superset are not total orderings.
msg74622 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-10-10 08:56
[Raymond]
> I don't think this is necessary.

I disagree.  I think some sort of warning is necessary;  it doesn't need
to be particularly prominent, but it should be there.

Almost *all* expectations are broken for sets in the absence of
transitivity of equality for the set elements.  Consider the following
(Python 2.6) snippet involving a set s:

>>> s.remove(17)
>>> 17 in s
True

An element is removed from a set s, and yet it's still present after the
removal!  Doesn't this deserve an explanation somewhere?

In case you haven't guessed, here's what s is:

>>> s
set([Fraction(17, 1), Decimal('17')])

Regardless of whether one wants to call this a bug or not, I think it's
sufficiently unintuitive and surprising that it should be documented.

Terry's suggestion and wordings for the reference and library warnings
look good to me.
msg74636 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-10-10 17:23
I'll update the note in the Ref Manual section on comparisons.

The docs on sets, dicts, and other containers need to remain clean.  I
object to littering the docs with these kind of "omg danger" messages. 
For beginners using the docs to learn what a container does, these kind
of notes are hard to understand, raise unnecessary worries about the
robustness of the language, and don't provide actionable information. 
For advanced users like yourselves, it doesn't help either.  All it does
is provide some minimal satisfaction that your favorite annoying oddity
is in the docs.

Also, the set/dict docs are the wrong place to discuss the issue.  They
happen to be the arbitrary tool you chose to demonstrate an issue that
properly relates to comparisons in Python.  Since the very beginning, it
has been possible to create comparisons that violate our mental
invariants for containers.  You can feed a partial or random ordering to
sort.  Mertz has an article on oddities arising from identity versus
equality.  A rich comparison can return a vector but get collapsed into
a boolean by the == operator.  You can get hash functions that don't
correspond to equality, etc.  There are tricks with NaN being identical
to itself but not equal to itself.  If these things get noted at all, it
should be in the docs for numbers and nans or comparisons.  It makes no
sense to try to add a comment to every possible place that has an
equality invariant that can be fooled.  For example, it is not hard to
produce a list example where len(s)==1 and s[0] != x and x in s, just
set x=float('nan') and s=[x].  But, of course, we're not going to
clutter the list docs with this nonsense.
msg74639 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-10-10 18:36
Para 1: Thank you.

Pars 2: I understand and accept your concern.

Para 3. You are right odd comparisons are the root of several problems.
Following you suggestion, let's at least add one blanket,
cover-our-asses warning at the bottom of the comparison section.  Example:

"Warning: if comparisons among container members violate the usual
rules, container operations may give unexpected or anomalous results."

Possibly add "We will not try to document the various possibilities."

I think something at the bottom of lib ref/numbers, possibly pointing to
Nan != Nan (which some users will not know about), would also be a good
idea.


----
Just to finish Mark's example
>>> s = {fractions.Fraction(17,1), decimal.Decimal(17)}
>>> s-{17}
set()

Subtracting a set with one member removes two members and gives a
different answer than just removing that member.  But this is more an
issue for 4087
msg75930 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-16 06:57
Attaching a proposed doc fix-up for this little can of worms (and for
issue 4087).
msg75989 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-11-17 22:55
r67249
History
Date User Action Args
2008-11-17 22:55:51rhettingersetstatus: open -> closed
resolution: fixed
messages: + msg75989
2008-11-16 06:57:55rhettingersetfiles: + expr.diff
keywords: + patch
messages: + msg75930
2008-10-10 18:36:25terry.reedysetmessages: + msg74639
2008-10-10 17:23:31rhettingersetmessages: + msg74636
2008-10-10 08:56:15mark.dickinsonsetnosy: + mark.dickinson
messages: + msg74622
2008-10-09 17:18:40rhettingersetpriority: low
assignee: georg.brandl -> rhettinger
messages: + msg74589
nosy: + rhettinger
2008-10-09 16:17:26terry.reedycreate