classification
Title: Adopt and document consistent semantics for handling NaN values in containers
Type: behavior Stage:
Components: Documentation, Library (Lib) Versions: Python 3.4, Python 3.3, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: andymaier, belopolsky, daniel.urban, docs@python, mark.dickinson, ncoghlan, rhettinger, terry.reedy, v+python
Priority: normal Keywords:

Created on 2011-04-28 03:42 by ncoghlan, last changed 2014-07-21 04:38 by rhettinger. This issue is now closed.

Messages (14)
msg134639 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 03:42
The question of the way Python handles NaN came up again on python-dev recently. The current semantics have been assessed as a reasonable compromise, but a poorly explained and inconsistently implemented one.

Based on a suggestion from Terry Reedy [1] I propose that a new glossary entry be added for "Reflexive Equality":

"Part of the standard mathematical definition of equality is that it is reflexive, that is ``x is y`` necessarily implies that ``x == y``. This is an essential property that is relied upon when designing and implementing container classes such as ``list`` and ``dict``.

However, the IEEE754 committee defined the float Not_a_Number (NaN) values as being unequal with all others floats, including themselves. While this design choice violates the basic mathematical definition of equality, it is still considered desirable to be able to correctly implement IEEE754 floating point semantics, and those of similar types such as ``decimal.Decimal``, directly in Python.

Accordingly, Python makes the follow compromise in order to cope with types that use non-reflexive definitions of equality without breaking the invariants of container classes that rely on reflexive definitions of equality:

1. Direct equality comparisons involving ``NaN``, such as ``nan=float('NaN'); nan == nan``, follow the IEEE754 rule and return False (or True in the case of ``!=``). This rule applies to ``float`` and ``decimal.Decimal`` within the builtins and standard library.

2. Indirect comparisons conducted internally by container classes, such as ``x in someset`` or ``seq.count(x)`` or ``somedict[x]``, enforce reflexivity by using the expressions ``x is y or x == y`` and ``x is not y and x != y`` respectively rather than assuming that ``x == y`` and ``x != y`` will always respect the reflexivity requirement. This rule applies to all container types within the builtins and standard library that may contain values of arbitrary types.

Also see [1] for a more comprehensive theoretical discussion of this topic.

[1] http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/"

Specific container methods that have currently been identified as relying on the reflexivity assumption are:
- __contains__() (for x in c: assert x in c)
- __eq__() (assert [x] == [x])
- __ne__() (assert not [x] != [x])
- index() (for x in c: assert 0 <= c.index(x) < len(c))
- count() (for x in c: assert c.count(x) > 0)

collections.Sequence and array.array (with the 'f' or 'd' type indicators) have already been identified as container classes in the standard library that fails to follow the second guideline and hence fail to correctly implement the above invariants in the presence of non-reflexive definitions of equality. They will be fixed as part of implementing this patch. Other container types that fail to correctly enforce reflexivity can be fixed as they are identified.

[1] http://mail.python.org/pipermail/python-dev/2011-April/110962.html
msg134640 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 03:50
Actually, based on the NumPy precedent [1], array.array should be fine as is. Since it uses raw C floats and doubles internally, rather than Python objects, there is no clear concept of "object identity" to use to enforce reflexivity.

[1] http://mail.python.org/pipermail/python-dev/2011-April/110987.html
msg134646 - (view) Author: Glenn Linderman (v+python) Date: 2011-04-28 05:32
Bertrand Meyer's exposition is flowery, and he is a learned man, but the basic argument he makes is:

Reflexivity of equality  is something that we expect for any data type, and it seems hard to justify that a value is not equal to itself. As to assignment, what good can it be if it does not make the target equal to the source value?  

The argument is flawed: now that NaN exists, and is not equal to itself in value, there should be, and need be, no expectation that assignment elsewhere should make the target equal to the source in value.  It can, and in Python, should, make them match in identity (is) but not in value (==, equality).

I laud the idea of adding to definition of reflexive equality to the glossary.  However, I think it is presently a bug that a list containing a NaN value compares equal to itself.  Yes, such a list should have the same identity (is), but should not be equal.
msg134651 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-28 06:21
> I think it is presently a bug that a list containing
> a NaN value compares equal to itself. 

Moreover, it also compares equal to another list containing the same NaN:

>>> [nan] is [nan]
False
>>> [nan] == [nan]
True

Here is another case of is implies == optimization breaking NaN property in stdlib:

>>> import ctypes
>>> x = ctypes.c_double(nan)
>>> x == x
True
msg134654 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 07:01
The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly.
msg134655 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-28 07:16
On Thu, Apr 28, 2011 at 3:01 AM, Nick Coghlan <report@bugs.python.org> wrote:
..
> The status quo works.

No it does not. I am yet to see a Python program that uses
non-reflexivity of NaN in a meaningful way.  What I've seen was either
programmers ignore it and write slightly buggy programs ("slightly"
because it is actually hard to produce a NaN in Python code) or they
add extra code to filter out NaN values before numbers are compared.

> Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals
> to simply document it clearly.

Documenting the status quo is necessary for any proposal to change.
msg134656 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 07:26
By "works" I merely meant that you can currently achieve both of the following:

1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object")

2. Explicitly force reflexivity when you need it, either by filtering out nonconformant values, or by checking identity before checking equality.

The "pure" equality-tests-are-always-reflexive approach advocated by Meyer rules out option 1. Given that one of the use cases for Python is to prototype algorithms that are later translated to C or C++, formally disallowing that use case would be problematic.
msg134659 - (view) Author: Glenn Linderman (v+python) Date: 2011-04-28 07:40
Nick says (and later explains better what he meant): 
The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly.

I say:
What the status quo doesn't provide is containers that "work".  In this case what I mean by "work" is that equality of containers is based on value, and value comparisons, and accept and embrace non-reflexive equality.  It might be possible to implement alternate containers with these characteristics, but that requires significantly more effort than simply filtering values.

Nonetheless, I totally agree with msg134654, and agree that properly documenting the present implementation would be a great service to users of the present implementation.
msg134660 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-28 07:55
On Thu, Apr 28, 2011 at 3:26 AM, Nick Coghlan <report@bugs.python.org> wrote:
..
> 1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons
> (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object")
>

If being "fully conformant" with various IEEE standards was a design
goal for Python, we would have leap seconds in the datetime module.
:-)

Python builtin float equality being reflexive does not in any way
inhibits anyone's ability to *write* a fully conforming
implementation.  In fact, if we ever get arithmetic operations
implemented for ctypes types, I would argue that c_double comparison
of c_double values would need to be changed to match C behavior.  (I
am +0 on changing that even without implementing arithmetics.)

I realize, however that by "status quo" you mean container operations
not calling __eq__ on identical objects.  I agree that this should not
change.  Making float comparison reflexive will actually make this
feature less controversial.
msg134715 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-04-28 19:30
To repeat concisely what I said on pydev list, I think Reference 5.9. Comparisons, which says

"Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.".

needs 'be indentical or ' added before 'compare equal and ...'

"Mappings (dictionaries) compare equal if and only if they have the same (key, value) pairs."

may be ok, depending on how one interprets 'same (key, value) pairs'.

Alexander has opened a separate issue to change behavior in 3.3.
msg134729 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 23:18
After further discussion on python-dev, it is clear that this identity checking behaviour handles more than just NaNs - it also allows containers to cope more gracefully with objects like NumPy arrays that make use of rich comparisons to return something other than simple True/False values for equality checks.

Also, since I neglected to mention it in the initial post, merely *adding* the glossary entry is just the first step. It then needs to be referenced from the appropriate points in the language and library reference.
msg134730 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-04-28 23:28
Scratch the first half of that last comment - Guido pointed out that false positives rear their ugly head almost immediately if you try to store rich comparison objects in other containers.
msg192850 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-07-11 07:52
I think this should be closed.  AFAICT it is of interest to a very tiny subset of the human species and as near as I can tell that subset doesn't include people in the numeric and statistics community (the ones who actually use NaNs as placeholders for missing values).

So much code (and human reasoning) assumes that identity-implies-equality, that is would be easier to document the exception to expectation than to try to find every place in every module where the assumption is present (implicitly or explicitly).  Instead, it would be better to document that distinct float('NaN') objects are never equal to one another and that identical float('NaN') objects may or may not compare equal in various implementation dependent circumstances.
msg223559 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-07-21 04:38
Closing for the reasons listed and also because there is another pair of tracker items 22000 and 22001 pursuing related documentation updates.
History
Date User Action Args
2014-07-21 04:38:01rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg223559
2014-07-16 15:00:47andymaiersetnosy: + andymaier
2013-07-11 07:52:21rhettingersetmessages: + msg192850
2013-06-30 06:00:45terry.reedysetversions: + Python 3.4, - Python 3.2
2011-04-29 17:13:19daniel.urbansetnosy: + daniel.urban
2011-04-28 23:28:43ncoghlansetmessages: + msg134730
2011-04-28 23:18:09ncoghlansetmessages: + msg134729
2011-04-28 19:30:31terry.reedysetnosy: + terry.reedy
messages: + msg134715
2011-04-28 11:08:56mark.dickinsonsetnosy: + mark.dickinson
2011-04-28 07:55:05belopolskysetmessages: + msg134660
2011-04-28 07:40:07v+pythonsetmessages: + msg134659
2011-04-28 07:26:49ncoghlansetmessages: + msg134656
2011-04-28 07:16:33belopolskysetmessages: + msg134655
2011-04-28 07:07:14rhettingersetassignee: docs@python -> rhettinger

nosy: + rhettinger
2011-04-28 07:01:48ncoghlansetmessages: + msg134654
2011-04-28 06:21:27belopolskysetnosy: + belopolsky
messages: + msg134651
2011-04-28 05:32:53v+pythonsetnosy: + v+python
messages: + msg134646
2011-04-28 03:50:38ncoghlansetmessages: + msg134640
2011-04-28 03:42:23ncoghlancreate