Issue 4090: Documenting set comparisons and operations

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/48340

classification

Title:	Documenting set comparisons and operations
Type:		Stage:
Components:	Documentation	Versions:	Python 3.0, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	georg.brandl, mark.dickinson, rhettinger, terry.reedy
Priority:	low	Keywords:	patch

Created on 2008-10-09 16:17 by terry.reedy, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
expr.diff	rhettinger, 2008-11-16 06:57	Doc patch for the reference manual

Messages (7)
msg74585 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-10-09 16:17
RefMan Expressions Comparisons has a subsection headed "Comparison of objects of the same type depends on the type" with entries for numbers, bytes, strings, tuples, lists, mappings, and most_other (compared by id). Sets (and dict views) are missing. While sets are similar to dicts, they are different because they also have order comparisons. A problem in defining comparisons for sets is that the usual definitions depend on equality comparisons of the members involved being, as usual, reflexive, symmetric, and transitive. But float('nan') is irreflexive. For integral value i, int(i), float(i) or Fraction(i), and Decimal(i) are (currently) jointly intransitive. See http://bugs.python.org/issue4087 Even without these issues, users are free to write __eq__ methods however they want. So I suggest something like the following: "If equality among the set members involved is reflexive, symmetric, and transitive as defined in mathematics, set comparisons have the usual definitions in terms of set inclusion. Otherwise, they are undefined." If dict equality had been defined in terms of equality of the set of (key,value) pairs, it would have the same problem. The algorithmic definition in terms of ordered lists works fine, however. I also suggest a warning be added at the top of the set section in the Lib. Ref. Something like: "The usual definitions of set operations, given below, depend on equality testing between the members involved being reflexive, symmetric, and transitive. If this is not true, results are undefined."
msg74589 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-10-09 17:18
I don't think this is necessary. The ordering operators for sets are already documented to mean subset/superset comparisons. Will look at it a bit more and possibly add a parenthetical note reminding people that superset/superset are not total orderings.
msg74622 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2008-10-10 08:56
[Raymond] > I don't think this is necessary. I disagree. I think some sort of warning is necessary; it doesn't need to be particularly prominent, but it should be there. Almost all expectations are broken for sets in the absence of transitivity of equality for the set elements. Consider the following (Python 2.6) snippet involving a set s: >>> s.remove(17) >>> 17 in s True An element is removed from a set s, and yet it's still present after the removal! Doesn't this deserve an explanation somewhere? In case you haven't guessed, here's what s is: >>> s set([Fraction(17, 1), Decimal('17')]) Regardless of whether one wants to call this a bug or not, I think it's sufficiently unintuitive and surprising that it should be documented. Terry's suggestion and wordings for the reference and library warnings look good to me.
msg74636 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-10-10 17:23
I'll update the note in the Ref Manual section on comparisons. The docs on sets, dicts, and other containers need to remain clean. I object to littering the docs with these kind of "omg danger" messages. For beginners using the docs to learn what a container does, these kind of notes are hard to understand, raise unnecessary worries about the robustness of the language, and don't provide actionable information. For advanced users like yourselves, it doesn't help either. All it does is provide some minimal satisfaction that your favorite annoying oddity is in the docs. Also, the set/dict docs are the wrong place to discuss the issue. They happen to be the arbitrary tool you chose to demonstrate an issue that properly relates to comparisons in Python. Since the very beginning, it has been possible to create comparisons that violate our mental invariants for containers. You can feed a partial or random ordering to sort. Mertz has an article on oddities arising from identity versus equality. A rich comparison can return a vector but get collapsed into a boolean by the == operator. You can get hash functions that don't correspond to equality, etc. There are tricks with NaN being identical to itself but not equal to itself. If these things get noted at all, it should be in the docs for numbers and nans or comparisons. It makes no sense to try to add a comment to every possible place that has an equality invariant that can be fooled. For example, it is not hard to produce a list example where len(s)==1 and s[0] != x and x in s, just set x=float('nan') and s=[x]. But, of course, we're not going to clutter the list docs with this nonsense.
msg74639 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2008-10-10 18:36
Para 1: Thank you. Pars 2: I understand and accept your concern. Para 3. You are right odd comparisons are the root of several problems. Following you suggestion, let's at least add one blanket, cover-our-asses warning at the bottom of the comparison section. Example: "Warning: if comparisons among container members violate the usual rules, container operations may give unexpected or anomalous results." Possibly add "We will not try to document the various possibilities." I think something at the bottom of lib ref/numbers, possibly pointing to Nan != Nan (which some users will not know about), would also be a good idea. ---- Just to finish Mark's example >>> s = {fractions.Fraction(17,1), decimal.Decimal(17)} >>> s-{17} set() Subtracting a set with one member removes two members and gives a different answer than just removing that member. But this is more an issue for 4087
msg75930 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-11-16 06:57
Attaching a proposed doc fix-up for this little can of worms (and for issue 4087).
msg75989 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2008-11-17 22:55
r67249

History
Date	User	Action	Args
2022-04-11 14:56:40	admin	set	github: 48340
2008-11-17 22:55:51	rhettinger	set	status: open -> closed resolution: fixed messages: + msg75989
2008-11-16 06:57:55	rhettinger	set	files: + expr.diff keywords: + patch messages: + msg75930
2008-10-10 18:36:25	terry.reedy	set	messages: + msg74639
2008-10-10 17:23:31	rhettinger	set	messages: + msg74636
2008-10-10 08:56:15	mark.dickinson	set	nosy: + mark.dickinson messages: + msg74622
2008-10-09 17:18:40	rhettinger	set	priority: low assignee: georg.brandl -> rhettinger messages: + msg74589 nosy: + rhettinger
2008-10-09 16:17:26	terry.reedy	create