Message 75745 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	mikecurtis
Recipients	barry, christian.heimes, gvanrossum, mark.dickinson, mikecurtis, rhettinger
Date	2008-11-11.15:59:26
SpamBayes Score	4.385381e-15
Marked as misclassified	No
Message-id	<1226419169.26.0.883630840929.issue4296@psf.upfronthosting.co.za>
In-reply-to

Content
All, Thank you for your rigorous analysis of this bug. To answer the question of the impact of this bug: the real issue that caused problems for our application was Python deciding to silently cast NaN falues to 0L, as discussed here: http://mail.python.org/pipermail/python-dev/2008-January/075865.html This would cause us to erroneously recognize 0s in our dataset when our input was invalid, which caused various issues. Per that thread, it sounds like there is no intention to fix this for versions prior to 3.0, so I decided to detect NaN values early on with the following: def IsNan(x): return (x is x) and (x != x) This is not the most rigorous check, but since our inputs are expected to be restricted to N-dimensional lists of numeric and/or string values, this was sufficient for our purposes. However, I wanted to be clear as to what would happen if this were handed a vector or matrix containing a NaN, so I did a quick check, which led me to this bug. My workaround is to manually avoid the optimization, with the following code: def IsNan(x): if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set): for i in x: if IsNan(i): return True return False else: return (x is x) and (x != x) This isn't particularly pretty, but since our inputs are relatively constrained, and since this isn't performance-critical code, it suffices for our purposes. For anyone working with large datasets, this would be suboptimal. (As an aside, if someone has a better solution for a general-case NaN-checker, which I'm sure someone does, feel free to let me know what it is). Additionally, while I believe that it is most correct to say that a list containing NaN is not equal to itself, I would hesitate to claim that it is even what most applications would desire. I could easily imagine individuals who would only wish for the list to be considered NaN-like if all of its values are NaN. Of course, that wouldn't be solved by any changes that might be made here. Once one gets into that level of detail, I think the programmer needs to implement the check manually to guarantee any particular expected outcome. Returning to the matter at hand: while I cringe to know that there is this inconsistency in the language, as a realist I completely agree that it would be unreasonable to remove the optimization to preserve this very odd corner case. For this reason, I proposed a minimal solution here to be that this oddity merely be documented better. Thanks again for your thoughts.

All,

Thank you for your rigorous analysis of this bug.  To answer the
question of the impact of this bug: the real issue that caused problems
for our application was Python deciding to silently cast NaN falues to
0L, as discussed here:

http://mail.python.org/pipermail/python-dev/2008-January/075865.html

This would cause us to erroneously recognize 0s in our dataset when our
input was invalid, which caused various issues.  Per that thread, it
sounds like there is no intention to fix this for versions prior to 3.0,
so I decided to detect NaN values early on with the following:


def IsNan(x):
  return (x is x) and (x != x)


This is not the most rigorous check, but since our inputs are expected
to be restricted to N-dimensional lists of numeric and/or string values,
this was sufficient for our purposes.

However, I wanted to be clear as to what would happen if this were
handed a vector or matrix containing a NaN, so I did a quick check,
which led me to this bug.  My workaround is to manually avoid the
optimization, with the following code:


def IsNan(x):
  if isinstance(x, list) or isinstance(x, tuple) or isinstance(x, set):
    for i in x:
      if IsNan(i):
        return True
    return False
  else:
    return (x is x) and (x != x)


This isn't particularly pretty, but since our inputs are relatively
constrained, and since this isn't performance-critical code, it suffices
for our purposes.  For anyone working with large datasets, this would be
suboptimal.  (As an aside, if someone has a better solution for a
general-case NaN-checker, which I'm sure someone does, feel free to let
me know what it is).

Additionally, while I believe that it is most correct to say that a list
containing NaN is not equal to itself, I would hesitate to claim that it
is even what most applications would desire.  I could easily imagine
individuals who would only wish for the list to be considered NaN-like
if all of its values are NaN.  Of course, that wouldn't be solved by any
changes that might be made here.  Once one gets into that level of
detail, I think the programmer needs to implement the check manually to
guarantee any particular expected outcome.

Returning to the matter at hand: while I cringe to know that there is
this inconsistency in the language, as a realist I completely agree that
it would be unreasonable to remove the optimization to preserve this
very odd corner case.  For this reason, I proposed a minimal solution
here to be that this oddity merely be documented better.

Thanks again for your thoughts.

History
Date	User	Action	Args
2008-11-11 15:59:29	mikecurtis	set	recipients: + mikecurtis, gvanrossum, barry, rhettinger, mark.dickinson, christian.heimes
2008-11-11 15:59:29	mikecurtis	set	messageid: <1226419169.26.0.883630840929.issue4296@psf.upfronthosting.co.za>
2008-11-11 15:59:28	mikecurtis	link	issue4296 messages
2008-11-11 15:59:26	mikecurtis	create