Issue 11945: Adopt and document consistent semantics for handling NaN values in containers

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/56154

classification

Title:	Adopt and document consistent semantics for handling NaN values in containers
Type:	behavior	Stage:
Components:	Documentation, Library (Lib)	Versions:	Python 3.3, Python 3.4, Python 2.7

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:	rhettinger	Nosy List:	andymaier, belopolsky, daniel.urban, docs@python, mark.dickinson, ncoghlan, rhettinger, terry.reedy, v+python
Priority:	normal	Keywords:

Created on 2011-04-28 03:42 by ncoghlan, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (14)
msg134639 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 03:42
The question of the way Python handles NaN came up again on python-dev recently. The current semantics have been assessed as a reasonable compromise, but a poorly explained and inconsistently implemented one. Based on a suggestion from Terry Reedy [1] I propose that a new glossary entry be added for "Reflexive Equality": "Part of the standard mathematical definition of equality is that it is reflexive, that is ``x is y`` necessarily implies that ``x == y``. This is an essential property that is relied upon when designing and implementing container classes such as ``list`` and ``dict``. However, the IEEE754 committee defined the float Not_a_Number (NaN) values as being unequal with all others floats, including themselves. While this design choice violates the basic mathematical definition of equality, it is still considered desirable to be able to correctly implement IEEE754 floating point semantics, and those of similar types such as ``decimal.Decimal``, directly in Python. Accordingly, Python makes the follow compromise in order to cope with types that use non-reflexive definitions of equality without breaking the invariants of container classes that rely on reflexive definitions of equality: 1. Direct equality comparisons involving ``NaN``, such as ``nan=float('NaN'); nan == nan``, follow the IEEE754 rule and return False (or True in the case of ``!=``). This rule applies to ``float`` and ``decimal.Decimal`` within the builtins and standard library. 2. Indirect comparisons conducted internally by container classes, such as ``x in someset`` or ``seq.count(x)`` or ``somedict[x]``, enforce reflexivity by using the expressions ``x is y or x == y`` and ``x is not y and x != y`` respectively rather than assuming that ``x == y`` and ``x != y`` will always respect the reflexivity requirement. This rule applies to all container types within the builtins and standard library that may contain values of arbitrary types. Also see [1] for a more comprehensive theoretical discussion of this topic. [1] http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civilization/" Specific container methods that have currently been identified as relying on the reflexivity assumption are: - __contains__() (for x in c: assert x in c) - __eq__() (assert [x] == [x]) - __ne__() (assert not [x] != [x]) - index() (for x in c: assert 0 <= c.index(x) < len(c)) - count() (for x in c: assert c.count(x) > 0) collections.Sequence and array.array (with the 'f' or 'd' type indicators) have already been identified as container classes in the standard library that fails to follow the second guideline and hence fail to correctly implement the above invariants in the presence of non-reflexive definitions of equality. They will be fixed as part of implementing this patch. Other container types that fail to correctly enforce reflexivity can be fixed as they are identified. [1] http://mail.python.org/pipermail/python-dev/2011-April/110962.html
msg134640 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 03:50
Actually, based on the NumPy precedent [1], array.array should be fine as is. Since it uses raw C floats and doubles internally, rather than Python objects, there is no clear concept of "object identity" to use to enforce reflexivity. [1] http://mail.python.org/pipermail/python-dev/2011-April/110987.html
msg134646 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-04-28 05:32
Bertrand Meyer's exposition is flowery, and he is a learned man, but the basic argument he makes is: Reflexivity of equality is something that we expect for any data type, and it seems hard to justify that a value is not equal to itself. As to assignment, what good can it be if it does not make the target equal to the source value? The argument is flawed: now that NaN exists, and is not equal to itself in value, there should be, and need be, no expectation that assignment elsewhere should make the target equal to the source in value. It can, and in Python, should, make them match in identity (is) but not in value (==, equality). I laud the idea of adding to definition of reflexive equality to the glossary. However, I think it is presently a bug that a list containing a NaN value compares equal to itself. Yes, such a list should have the same identity (is), but should not be equal.
msg134651 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2011-04-28 06:21
> I think it is presently a bug that a list containing > a NaN value compares equal to itself. Moreover, it also compares equal to another list containing the same NaN: >>> [nan] is [nan] False >>> [nan] == [nan] True Here is another case of is implies == optimization breaking NaN property in stdlib: >>> import ctypes >>> x = ctypes.c_double(nan) >>> x == x True
msg134654 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 07:01
The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly.
msg134655 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2011-04-28 07:16
On Thu, Apr 28, 2011 at 3:01 AM, Nick Coghlan <report@bugs.python.org> wrote: .. > The status quo works. No it does not. I am yet to see a Python program that uses non-reflexivity of NaN in a meaningful way. What I've seen was either programmers ignore it and write slightly buggy programs ("slightly" because it is actually hard to produce a NaN in Python code) or they add extra code to filter out NaN values before numbers are compared. > Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals > to simply document it clearly. Documenting the status quo is necessary for any proposal to change.
msg134656 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 07:26
By "works" I merely meant that you can currently achieve both of the following: 1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object") 2. Explicitly force reflexivity when you need it, either by filtering out nonconformant values, or by checking identity before checking equality. The "pure" equality-tests-are-always-reflexive approach advocated by Meyer rules out option 1. Given that one of the use cases for Python is to prototype algorithms that are later translated to C or C++, formally disallowing that use case would be problematic.
msg134659 - (view)	Author: Glenn Linderman (v+python) *	Date: 2011-04-28 07:40
Nick says (and later explains better what he meant): The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly. I say: What the status quo doesn't provide is containers that "work". In this case what I mean by "work" is that equality of containers is based on value, and value comparisons, and accept and embrace non-reflexive equality. It might be possible to implement alternate containers with these characteristics, but that requires significantly more effort than simply filtering values. Nonetheless, I totally agree with msg134654, and agree that properly documenting the present implementation would be a great service to users of the present implementation.
msg134660 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2011-04-28 07:55
On Thu, Apr 28, 2011 at 3:26 AM, Nick Coghlan <report@bugs.python.org> wrote: .. > 1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons > (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object") > If being "fully conformant" with various IEEE standards was a design goal for Python, we would have leap seconds in the datetime module. :-) Python builtin float equality being reflexive does not in any way inhibits anyone's ability to write a fully conforming implementation. In fact, if we ever get arithmetic operations implemented for ctypes types, I would argue that c_double comparison of c_double values would need to be changed to match C behavior. (I am +0 on changing that even without implementing arithmetics.) I realize, however that by "status quo" you mean container operations not calling __eq__ on identical objects. I agree that this should not change. Making float comparison reflexive will actually make this feature less controversial.
msg134715 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-04-28 19:30
To repeat concisely what I said on pydev list, I think Reference 5.9. Comparisons, which says "Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.". needs 'be indentical or ' added before 'compare equal and ...' "Mappings (dictionaries) compare equal if and only if they have the same (key, value) pairs." may be ok, depending on how one interprets 'same (key, value) pairs'. Alexander has opened a separate issue to change behavior in 3.3.
msg134729 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 23:18
After further discussion on python-dev, it is clear that this identity checking behaviour handles more than just NaNs - it also allows containers to cope more gracefully with objects like NumPy arrays that make use of rich comparisons to return something other than simple True/False values for equality checks. Also, since I neglected to mention it in the initial post, merely adding the glossary entry is just the first step. It then needs to be referenced from the appropriate points in the language and library reference.
msg134730 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2011-04-28 23:28
Scratch the first half of that last comment - Guido pointed out that false positives rear their ugly head almost immediately if you try to store rich comparison objects in other containers.
msg192850 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2013-07-11 07:52
I think this should be closed. AFAICT it is of interest to a very tiny subset of the human species and as near as I can tell that subset doesn't include people in the numeric and statistics community (the ones who actually use NaNs as placeholders for missing values). So much code (and human reasoning) assumes that identity-implies-equality, that is would be easier to document the exception to expectation than to try to find every place in every module where the assumption is present (implicitly or explicitly). Instead, it would be better to document that distinct float('NaN') objects are never equal to one another and that identical float('NaN') objects may or may not compare equal in various implementation dependent circumstances.
msg223559 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2014-07-21 04:38
Closing for the reasons listed and also because there is another pair of tracker items 22000 and 22001 pursuing related documentation updates.

History
Date	User	Action	Args
2022-04-11 14:57:16	admin	set	github: 56154
2014-07-21 04:38:01	rhettinger	set	status: open -> closed resolution: not a bug messages: + msg223559
2014-07-16 15:00:47	andymaier	set	nosy: + andymaier
2013-07-11 07:52:21	rhettinger	set	messages: + msg192850
2013-06-30 06:00:45	terry.reedy	set	versions: + Python 3.4, - Python 3.2
2011-04-29 17:13:19	daniel.urban	set	nosy: + daniel.urban
2011-04-28 23:28:43	ncoghlan	set	messages: + msg134730
2011-04-28 23:18:09	ncoghlan	set	messages: + msg134729
2011-04-28 19:30:31	terry.reedy	set	nosy: + terry.reedy messages: + msg134715
2011-04-28 11:08:56	mark.dickinson	set	nosy: + mark.dickinson
2011-04-28 07:55:05	belopolsky	set	messages: + msg134660
2011-04-28 07:40:07	v+python	set	messages: + msg134659
2011-04-28 07:26:49	ncoghlan	set	messages: + msg134656
2011-04-28 07:16:33	belopolsky	set	messages: + msg134655
2011-04-28 07:07:14	rhettinger	set	assignee: docs@python -> rhettinger nosy: + rhettinger
2011-04-28 07:01:48	ncoghlan	set	messages: + msg134654
2011-04-28 06:21:27	belopolsky	set	nosy: + belopolsky messages: + msg134651
2011-04-28 05:32:53	v+python	set	nosy: + v+python messages: + msg134646
2011-04-28 03:50:38	ncoghlan	set	messages: + msg134640
2011-04-28 03:42:23	ncoghlan	create