classification
Title: Make float('nan') unorderable
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: alex, belopolsky, daniel.urban, mark.dickinson, rhettinger
Priority: normal Keywords: patch

Created on 2011-04-28 19:08 by belopolsky, last changed 2014-03-17 11:06 by schlamar. This issue is now closed.

Files
File name Uploaded Description Edit
unorderable-nans.diff belopolsky, 2011-04-28 23:44 review
Messages (24)
msg134713 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-28 19:08
Rationale:

"""
IEEE 754 assigns values to all relational expressions involving NaN.
In the syntax of C, the predicate x != y is True but all others, x <
y , x <= y , x == y , x >= y and x > y, are False whenever x or y or
both are NaN, and then all but x != y and x == y are INVALID
operations too and must so signal.
"""
-- Lecture Notes on the Status of IEEE Standard 754 for Binary
Floating-Point Arithmetic by Prof. W. Kahan
http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps

The problem with faithfully implementing IEEE 754 in Python is that
exceptions in IEEE standard don't have the same meaning as in Python.
 IEEE 754 requires that a value is computed even when the operation
signals an exception.  The program can then decide whether to
terminate computation or propagate the value. In Python, we have to
choose between raising an exception and returning the value.  We
cannot have both.  It appears that in most cases IEEE 754 "INVALID"
exception is treated as a terminating exception by Python and
operations that signal INVALID in IEEE 754 raise an exception in
Python.  Therefore making <, >, etc. raise on NaN while keeping the
status quo for != and == would bring Python floats closer to
compliance with IEEE 754.

See http://mail.python.org/pipermail/python-ideas/2011-April/010057.html for discussion.

An instructive part of the patch is

--- a/Lib/test/test_math.py
+++ b/Lib/test/test_math.py
@@ -174,10 +174,22 @@
                    flags
                   )
 
+def is_negative_zero(x):
+    return x == 0 and math.copysign(1, x) < 0
+
+def almost_equal(value, expected):
+    if math.isfinite(expected) and math.isfinite(value):
+        return abs(value-expected) <= eps
+    if math.isnan(expected):
+        return math.isnan(value)
+    if is_negative_zero(expected):
+        return is_negative_zero(value)
+    return value == expected
+
 class MathTests(unittest.TestCase):
 
     def ftest(self, name, value, expected):
-        if abs(value-expected) > eps:
+        if not almost_equal(value, expected):


Although it may look like proposed change makes it harder to compare floats for approximate equality, the change actually helped to highlight a programming mistake: old ftest() accepts 0.0 where -0.0 is expected.

This is a typical situation when someone attempts to write clever code relying on unusual properties of NaNs. In most cases clever code does not account for all possibilities and it is always hard reason about such code.
msg134732 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-04-28 23:44
Actually, my first attempt to fix the test was faulty.  The correct logic seems to be

+def is_negative_zero(x):
+    return x == 0 and math.copysign(1, x) < 0
+
+def almost_equal(value, expected):
+    if math.isfinite(expected) and math.isfinite(value):
+        if is_negative_zero(expected):
+            return is_negative_zero(value)
+        if is_negative_zero(value):
+            return is_negative_zero(expected)
+        return abs(value-expected) <= eps
+    if math.isnan(expected):
+        return math.isnan(value)
+    return value == expected
+
 class MathTests(unittest.TestCase):
+    
+    def test_xxx(self):
+        self.assertTrue(is_negative_zero(-0.0))
+        self.assertFalse(almost_equal(0.0, -0.0))
 
     def ftest(self, name, value, expected):
-        if abs(value-expected) > eps:
+        if not almost_equal(value, expected):

Now, the attached patch has two failures:

AssertionError: fmod(-10,1) returned -0.0, expected 0

and 

AssertionError: sqrt0002:sqrt(-0.0) returned -0.0, expected 0.0

The first seems to be a typo in the test, but I would not expect sqrt(-0.0) to return -0.0.  Does anyone know what the relevant standard says?
msg134734 - (view) Author: Alex Gaynor (alex) * (Python committer) Date: 2011-04-29 01:19
The C standard (and/or the POSIX one, I forget) says sqrt(-0.0) returns -0.0.
msg134841 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-04-30 07:21
> sqrt(-0.0) to return -0.0.  Does anyone know what the relevant standard says?

sqrt(-0.0) should indeed be -0.0, according to IEEE 754.
msg134842 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-04-30 07:23
Alexander,

There are lots of almost-equality tests in the test-suite already, between test_math, test_float, test_cmath and test_complex.  Do you need to implement another one here, or can you reuse one of the existing ones?
msg135040 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-03 14:52
> There are lots of almost-equality tests in the test-suite already,
> between test_math, test_float, test_cmath and test_complex.
>  Do you need to implement another one here, or can you reuse one
> of the existing ones?

I can probably use acc_check() instead of abs(value-expected) <= eps, but I am not sure that will be an improvement.  Most of the new logic deals with NaNs and negative zero and the almost-equality tests that I've seen don't implement these cases correctly for my use.
msg135045 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-03 16:05
I was thinking of something like the rAssertAlmostEqual method in test_cmath.
msg135059 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-05-03 18:58
Alexander, I urge you to take a good deal of care with this tracker item and not make any changes lightly.  Take a look at how other languages have dealt with the issue.

Also, consider that "unorderable" may not be the right answer at all.  The most common use of NaNs is as a placeholder for missing data.  Perhaps putting them at the end of a sort is the right thing to do (c.f. was databases do with NULL values).

The other major use for NaNs is a way to let an invalid intermediate result flow through the remainder of a calculation (much as @NA does in MS Excel).  The spirit of that use case would suggest that raising an exception during a sort is the wrong thing to do.

Another consideration is that it would be unusual (and likely unexpected) to have a type be orderable or not depending on a particular value.  Users ask themselves whether floats are orderable, not whether some values of floats are orderable.

I strongly oppose this patch in its current form and think it is likely to break existing code that expects NaNs to be quiet.
msg135060 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-05-03 19:00
Also, if you're going to make a change, please consult the scipy/numpy community.  They are the most knowledgeable on the subject and the most affected by any change.

Given that they have not made any feature requests or bug reports about the current behavior, there is an indication that change isn't necessary or desirable.
msg135132 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-04 14:44
On Tue, May 3, 2011 at 12:05 PM, Mark Dickinson <report@bugs.python.org> wrote:
..
> I was thinking of something like the rAssertAlmostEqual method in test_cmath.

This one is good.  I wonder if it would be appropriate to move
rAssertAlmostEqual() up to unitetest.case possibly replacing
assertAlmostEqual()? If replacing assertAlmostEqual() is not an
option, I would call it assertFloatAlmostEqual().
msg135983 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-14 18:50
It seems we're getting a bit off-topic for the issue title;  the discussion about cleaning up test_math (which I agree would be a good thing to do) should probably go into another issue.

On the issue itself, I'm -1 on making comparisons with float('nan') raise: I don't see that there's a real problem here that needs solving.  

Note that the current behaviour does *not* violate IEEE 754, since there's nothing anywhere in IEEE 754 that says that Python's < operation should raise for comparisons involving NaNs:  all that's said is that a conforming language should provide a number of comparison operations (without specifying how those operation should be spelt in the language in question), including both a < operation that's quiet (returning a false value for comparison with NaNs) and a < operation that signals on comparison with NaN.  There's nothing to indicate definitively which of these two operations '<' should bind to in a language.

It *is* true that C chooses to bind '<' to the signalling version, but that doesn't automatically mean that we should do the same in Python.
msg135985 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-14 19:00
> Therefore making <, >, etc. raise on NaN while keeping the
> status quo for != and == would bring Python floats closer to
> compliance with IEEE 754.

Not so.  Either way, Python would be providing exactly 10 of the 22 required IEEE 754 comparison operations (see sections 5.6.1 and 5.11 of IEEE 754-2008 for details).  If we wanted to move closer to compliance with IEEE 754, we should be providing all 22 comparisons.
msg135986 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-05-14 19:08
> On the issue itself, I'm -1 on making comparisons 
> with float('nan') raise: I don't see that there's
> a real problem here that needs solving.
>
> Note that the current behaviour does *not* violate IEEE 754, ...

I agree with Mark.  Am closing this feature request which is both ill-conceived and likely to cause more harm than good (possibly breaking code that currently does not fail).

> the discussion about cleaning up test_math 
> (which I agree would be a good thing to do) 
> should probably go into another issue.

I agree.
msg136111 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-16 17:26
On Sat, May 14, 2011 at 2:50 PM, Mark Dickinson <report@bugs.python.org> wrote:
..
> On the issue itself, I'm -1 on making comparisons with float('nan') raise: I don't see that there's a real problem here that needs solving.
>

I probably should have changed the title of this issue after making an
alternative proposal to make INVALID operations produce a warning:

http://mail.python.org/pipermail/python-ideas/2011-April/010101.html

For the case of nan ordering, this idea seemed to receive support on
the mailing list:

http://mail.python.org/pipermail/python-ideas/2011-April/010102.html
http://mail.python.org/pipermail/python-ideas/2011-April/010103.html
http://mail.python.org/pipermail/python-ideas/2011-April/010104.html

> Note that the current behaviour does *not* violate IEEE 754, since there's nothing anywhere
> in IEEE 754 that says that Python's < operation should raise for comparisons involving NaNs:
>  all that's said is that a conforming language should provide a number of comparison operations
> (without specifying how those operation should be spelt in the language in question), including
> both a < operation that's quiet (returning a false value for comparison with NaNs) and a <
> operation that signals on comparison with NaN.  There's nothing to indicate definitively which of
>  these two operations '<' should bind to in a language.
>

Yes, IEEE 754, provides little guidance to language designers, but why
would anyone want to treat
ordering of floats differently from ordering of decimals?

Traceback (most recent call last):
  ..
decimal.InvalidOperation: comparison involving NaN

> It *is* true that C chooses to bind '<' to the signalling version, but that doesn't automatically mean that we should do the same in Python.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue11949>
> _______________________________________
>
msg136115 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-16 18:20
On Sat, May 14, 2011 at 3:08 PM, Raymond Hettinger
<report@bugs.python.org> wrote:
..
>> Note that the current behaviour does *not* violate IEEE 754, ...
>
> I agree with Mark.

Do we really need a popular vote to determine what a published
standard does or does not require?

Section 7.2 of IEEE Std 754-2008 states:

"""
7.2 Invalid operation

The invalid operation exception is signaled if and only if there is no
usefully definable result. In these cases
the operands are invalid for the operation to be performed.
…

For operations producing no result in floating-point format, the
operations that signal the invalid operation exception are:
...
j)	comparison by way of unordered-signaling predicates listed in Table
5.2, when the operands are unordered
"""

Python comparison operators.

We can argue, of course about the proper mapping of IEEE 754 'INVALID'
exception to the available Python construct.  Arguably, a compliant
language can ignore INVALID exceptions, issue a warning while
returning result, or raise an exception and produce no result.  In a
post on Python ideas Mark argued that the ideal disposition of INVALID
is a ValueError:

"""
IMO, the ideal (ignoring backwards compatibility) would be to have
OverflowError / ZeroDivisionError / ValueError produced wherever
IEEE754 says that overflow / divide-by-zero / invalid-operation should
be signaled.
"""
http://mail.python.org/pipermail/python-ideas/2011-April/010106.html

If IEEE 754 compliance is a stated goal in Python design, it would
make very little sense to treat some cases of INVALID differently from
others.  If, however, IEEE 754 compliance is not a goal, we should
consider what is the most useful behavior.   On the mailing list, I
posted a challenge - review your code that will work differently if
nan ordering was disallowed and report whether that code does the
right thing for all kinds of float (including nan, inf and signed 0).
So far, I have not seen any responses to this.

My own experiment with the Python library itself, have revealed a bug
in the test suit.   This matches my prior experience: naive numeric
code usually produces nonsense results when nans are compared and
careful numeric code makes an effort to avoid comparing nans.

>  Am closing this feature request which is both ill-conceived and likely to cause more harm than good (possibly breaking code that currently does not fail).
>

My primary goal in posting this patch was to support the discussion on
python-ideas.  The patch was not intended to be applied as is.  At the
minimum, I would need to make nan < nan issue a deprecation warning
before turning it into an error.  If this is not an appropriate use of
the tracker - please propose an alternative.  Posting a patch on the
mailing list or outside of python.org seems to be a worse alternative.

>> the discussion about cleaning up test_math
>> (which I agree would be a good thing to do)
>> should probably go into another issue.
>
> I agree.

Why?  The issue in test_math is small enough that it can be fixed
without any discussion on the tracker.  If someone would want to
improve unittest based on this experience, this can indeed be handled
in a separate issue.  As long as the changes are limited to Lib/test,
I don't see what a separate issue will bring other than extra work.
msg136117 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-16 18:28
A tracker bug has mangled the following paragraph following the IEEE 754 standard quote in my previous post:

"""
Table 5.2 referenced above lists 10 operations, four of which (>, <,
>=, and <=) are given spellings that are identical to the spellings of
Python comparison operators.
"""
msg136466 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-21 19:18
> Table 5.2 referenced above lists 10 operations, four of which (>, <,
> >=, and <=) are given spellings that are identical to the spellings of
> Python comparison operators.

Yep, those are included amongst the "various ad-hoc and traditional names and symbols".  So what?  It's still the case that IEEE 754 gives no requirement (or even recommendation) for how either of 'compareQuietLess' or 'compareSignalingLess' should be spelt in any particular language.

IOW, it's fine to argue that *you* personally would like Python's '<' to be bound to IEEE 754's 'compareSignalingLess' instead of the current effective binding to 'compareQuietLess', but it would be a bit disingenuous to claim that IEEE 754 recommends or requires that.  It doesn't.
msg136469 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-21 19:36
On Sat, May 21, 2011 at 3:18 PM, Mark Dickinson <report@bugs.python.org> wrote:
>
> Mark Dickinson <dickinsm@gmail.com> added the comment:
>
>> Table 5.2 referenced above lists 10 operations, four of which (>, <,
>> >=, and <=) are given spellings that are identical to the spellings of
>> Python comparison operators.
>
> Yep, those are included amongst the "various ad-hoc and traditional names and symbols".  So what?
>  It's still the case that IEEE 754 gives no requirement (or even recommendation) for how either of
> 'compareQuietLess' or 'compareSignalingLess' should be spelt in any particular language.

IEEE 754 is not a standard that is directly applicable to the design
of programming languages.  For example, it is completely silent on the
issue of which operations should be implemented as infix operators and
which as functions.  Still, to the extent it is appropriate for IEEE
754 to say so, I think it says that '<' is 'compareSignalingLess'.

IEEE 754 can only be a guide for language design and not a
specification.  However, the decimal module, which was explicitly
designed for IEEE 754 compliance, makes order comparison operators
signaling.  What is the reason to make them quiet for floats other
than backward compatibility?  Note that backward compatibility is
likely not to be an issue if we make nan comparisons generate a
warning (possibly even off by default) rather than error.
msg136471 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-21 19:48
> What is the reason to make them quiet for floats other
> than backward compatibility?

For me, none.  I'll happily agree that, all other things being equal, it's more natural (and more consistent with other languages) to have < correspond to the signaling operation, and in a new language that's probably what I'd go for.  But as a *change* to existing behaviour in a language that's been widely adopted for numerical work, the risk of breakage seems to me to outweigh any benefits.
msg136472 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-21 19:50
On the idea of a warning, I don't really see the point;  I find it hard to imagine it's really going to catch many real errors.
msg136474 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-05-21 19:58
On Sat, May 21, 2011 at 3:50 PM, Mark Dickinson <report@bugs.python.org> wrote:
..
> On the idea of a warning, I don't really see the point;  I find it hard to imagine it's really going to catch many real errors.

My experience is different.  In my work, NaNs often creep into
calculations that are not designed to deal with them. (More often from
data files than from invalid operations.)  Sorting a large list with a
handful of NaNs, often leads to rather mysterious errors if not to
silently wrong results.  I believe there was even an issue on the
tracker about this particular case.
msg136475 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2011-05-21 20:02
Hmm, okay.  Call me +0 on the warning.
msg182314 - (view) Author: Marc Schlaich (schlamar) * Date: 2013-02-18 11:28
I'm +1 for a warning. The current behavior is really unexpectable:

In [6]: sorted([nan, 0, 1, -1])
Out[6]: [nan, -1, 0, 1]

In [7]: sorted([0, 1, -1, nan])
Out[7]: [-1, 0, 1, nan]

In [8]: sorted([0, nan, 1, -1])
Out[8]: [0, nan, -1, 1]
msg182362 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2013-02-19 05:54
-1 for a warning.  A should really have *no* expectations about a NaNs sort order.  For the most part, Python does not get into warnings business for every possible weird thing you could tell it to do (especially something as harmless as this).

Also warnings are a bit of PITA to shut-off.  For something like NaN ordering, a warning is likely to inflict more harm on the users than the NaN ordering issue itself.
History
Date User Action Args
2014-03-17 11:06:35schlamarsetnosy: - schlamar
2013-02-19 05:54:37rhettingersetmessages: + msg182362
2013-02-18 11:28:46schlamarsetnosy: + schlamar
messages: + msg182314
2011-05-22 02:30:48rhettingersetassignee: rhettinger
2011-05-21 20:02:07mark.dickinsonsetmessages: + msg136475
2011-05-21 19:58:02belopolskysetmessages: + msg136474
2011-05-21 19:50:47mark.dickinsonsetmessages: + msg136472
2011-05-21 19:48:17mark.dickinsonsetmessages: + msg136471
2011-05-21 19:36:07belopolskysetmessages: + msg136469
2011-05-21 19:18:07mark.dickinsonsetmessages: + msg136466
2011-05-16 18:28:57belopolskysetmessages: + msg136117
2011-05-16 18:20:49belopolskysetmessages: + msg136115
2011-05-16 17:26:38belopolskysetmessages: + msg136111
2011-05-14 19:08:20rhettingersetstatus: open -> closed
resolution: rejected
messages: + msg135986
2011-05-14 19:00:18mark.dickinsonsetmessages: + msg135985
2011-05-14 18:50:32mark.dickinsonsetmessages: + msg135983
2011-05-04 14:44:27belopolskysetmessages: + msg135132
2011-05-03 19:00:03rhettingersetmessages: + msg135060
2011-05-03 18:58:03rhettingersetnosy: + rhettinger
messages: + msg135059
2011-05-03 16:05:38mark.dickinsonsetmessages: + msg135045
2011-05-03 14:52:16belopolskysetmessages: + msg135040
2011-04-30 07:23:42mark.dickinsonsetmessages: + msg134842
2011-04-30 07:21:09mark.dickinsonsetmessages: + msg134841
2011-04-30 07:15:02mark.dickinsonsetnosy: + mark.dickinson
2011-04-29 17:13:38daniel.urbansetnosy: + daniel.urban
2011-04-29 01:19:00alexsetnosy: + alex
messages: + msg134734
2011-04-29 00:11:51belopolskysetfiles: - unorderable-nans.diff
2011-04-28 23:44:12belopolskysetfiles: + unorderable-nans.diff

messages: + msg134732
2011-04-28 19:15:22belopolskysetfiles: - unorderable-nans.diff
2011-04-28 19:14:58belopolskysetfiles: + unorderable-nans.diff
2011-04-28 19:08:57belopolskycreate