Author ncoghlan
Recipients docs@python, ncoghlan
Date 2011-04-28.03:42:22
SpamBayes Score 1.66533e-16
Marked as misclassified No
Message-id <1303962145.0.0.00759154980454.issue11945@psf.upfronthosting.co.za>
In-reply-to
Content
The question of the way Python handles NaN came up again on python-dev recently. The current semantics have been assessed as a reasonable compromise, but a poorly explained and inconsistently implemented one.

Based on a suggestion from Terry Reedy [1] I propose that a new glossary entry be added for "Reflexive Equality":

"Part of the standard mathematical definition of equality is that it is reflexive, that is ``x is y`` necessarily implies that ``x == y``. This is an essential property that is relied upon when designing and implementing container classes such as ``list`` and ``dict``.

However, the IEEE754 committee defined the float Not_a_Number (NaN) values as being unequal with all others floats, including themselves. While this design choice violates the basic mathematical definition of equality, it is still considered desirable to be able to correctly implement IEEE754 floating point semantics, and those of similar types such as ``decimal.Decimal``, directly in Python.

Accordingly, Python makes the follow compromise in order to cope with types that use non-reflexive definitions of equality without breaking the invariants of container classes that rely on reflexive definitions of equality:

1. Direct equality comparisons involving ``NaN``, such as ``nan=float('NaN'); nan == nan``, follow the IEEE754 rule and return False (or True in the case of ``!=``). This rule applies to ``float`` and ``decimal.Decimal`` within the builtins and standard library.

2. Indirect comparisons conducted internally by container classes, such as ``x in someset`` or ``seq.count(x)`` or ``somedict[x]``, enforce reflexivity by using the expressions ``x is y or x == y`` and ``x is not y and x != y`` respectively rather than assuming that ``x == y`` and ``x != y`` will always respect the reflexivity requirement. This rule applies to all container types within the builtins and standard library that may contain values of arbitrary types.

Also see [1] for a more comprehensive theoretical discussion of this topic.

[1] http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/"

Specific container methods that have currently been identified as relying on the reflexivity assumption are:
- __contains__() (for x in c: assert x in c)
- __eq__() (assert [x] == [x])
- __ne__() (assert not [x] != [x])
- index() (for x in c: assert 0 <= c.index(x) < len(c))
- count() (for x in c: assert c.count(x) > 0)

collections.Sequence and array.array (with the 'f' or 'd' type indicators) have already been identified as container classes in the standard library that fails to follow the second guideline and hence fail to correctly implement the above invariants in the presence of non-reflexive definitions of equality. They will be fixed as part of implementing this patch. Other container types that fail to correctly enforce reflexivity can be fixed as they are identified.

[1] http://mail.python.org/pipermail/python-dev/2011-April/110962.html
History
Date User Action Args
2011-04-28 03:42:25ncoghlansetrecipients: + ncoghlan, docs@python
2011-04-28 03:42:25ncoghlansetmessageid: <1303962145.0.0.00759154980454.issue11945@psf.upfronthosting.co.za>
2011-04-28 03:42:23ncoghlanlinkissue11945 messages
2011-04-28 03:42:22ncoghlancreate