This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mikecurtis
Recipients barry, christian.heimes, gvanrossum, mark.dickinson, mikecurtis, rhettinger
Date 2008-11-11.15:59:26
SpamBayes Score 4.385381e-15
Marked as misclassified No
Message-id <1226419169.26.0.883630840929.issue4296@psf.upfronthosting.co.za>
In-reply-to
Content
All,

Thank you for your rigorous analysis of this bug.  To answer the
question of the impact of this bug: the real issue that caused problems
for our application was Python deciding to silently cast NaN falues to
0L, as discussed here:

http://mail.python.org/pipermail/python-dev/2008-January/075865.html

This would cause us to erroneously recognize 0s in our dataset when our
input was invalid, which caused various issues.  Per that thread, it
sounds like there is no intention to fix this for versions prior to 3.0,
so I decided to detect NaN values early on with the following:


def IsNan(x):
  return (x is x) and (x != x)


This is not the most rigorous check, but since our inputs are expected
to be restricted to N-dimensional lists of numeric and/or string values,
this was sufficient for our purposes.

However, I wanted to be clear as to what would happen if this were
handed a vector or matrix containing a NaN, so I did a quick check,
which led me to this bug.  My workaround is to manually avoid the
optimization, with the following code:


def IsNan(x):
  if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set):
    for i in x:
      if IsNan(i):
        return True
    return False
  else:
    return (x is x) and (x != x)


This isn't particularly pretty, but since our inputs are relatively
constrained, and since this isn't performance-critical code, it suffices
for our purposes.  For anyone working with large datasets, this would be
suboptimal.  (As an aside, if someone has a better solution for a
general-case NaN-checker, which I'm sure someone does, feel free to let
me know what it is).

Additionally, while I believe that it is most correct to say that a list
containing NaN is not equal to itself, I would hesitate to claim that it
is even what most applications would desire.  I could easily imagine
individuals who would only wish for the list to be considered NaN-like
if all of its values are NaN.  Of course, that wouldn't be solved by any
changes that might be made here.  Once one gets into that level of
detail, I think the programmer needs to implement the check manually to
guarantee any particular expected outcome.

Returning to the matter at hand: while I cringe to know that there is
this inconsistency in the language, as a realist I completely agree that
it would be unreasonable to remove the optimization to preserve this
very odd corner case.  For this reason, I proposed a minimal solution
here to be that this oddity merely be documented better.

Thanks again for your thoughts.
History
Date User Action Args
2008-11-11 15:59:29mikecurtissetrecipients: + mikecurtis, gvanrossum, barry, rhettinger, mark.dickinson, christian.heimes
2008-11-11 15:59:29mikecurtissetmessageid: <1226419169.26.0.883630840929.issue4296@psf.upfronthosting.co.za>
2008-11-11 15:59:28mikecurtislinkissue4296 messages
2008-11-11 15:59:26mikecurtiscreate