Message 395773 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	congma, mark.dickinson, miss-islington, realead, rhettinger, serhiy.storchaka, tim.peters
Date	2021-06-14.04:40:25
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1623645626.06.0.202953890235.issue43475@roundup.psfhosted.org>
In-reply-to

Content
> If one wants to have all NaNs in one equivalency class > (e.g. if used as a key-value for example in pandas) it > is almost impossible to do so in a consistent way > without taking a performance hit. ISTM the performance of the equivalent class case is far less important than the one we were trying to solve. Given a choice we should prefer helping normal unadorned instances rather than giving preference to a subclass that redefines the usual behaviors. In CPython, it is a fact of life that overriding builtin behaviors with pure python code always incurs a performance hit. Also, in your example, the subclass isn't technically correct because it relies on a non-guaranteed implementation details. It likely isn't even the fastest approach. The only guaranteed behaviors are that math.isnan(x) reliably detects a NaN and that x!=x when x is a NaN. Those are the only assured tools in the uphill battle to fight the weird intrinsic nature of NaNs. So one possible solution is to replace all the NaNs with a canonical placeholder value that doesn't have undesired properties: {None if isnan(x) else x for x in arr} That relies on guaranteed behaviors and is reasonably fast. IMO that beats trying to reprogram float('NaN') to behave the opposite of how it was designed.

> If one wants to have all NaNs in one equivalency class
> (e.g. if used as a key-value for example in pandas) it
> is almost impossible to do so in a consistent way 
> without taking a performance hit.

ISTM the performance of the equivalent class case is far less important than the one we were trying to solve.  Given a choice we should prefer helping normal unadorned instances rather than giving preference to a subclass that redefines the usual behaviors.  

In CPython, it is a fact of life that overriding builtin behaviors with pure python code always incurs a performance hit.  Also, in your example, the subclass isn't technically correct because it relies on a non-guaranteed implementation details.  It likely isn't even the fastest approach.

The only guaranteed behaviors are that math.isnan(x) reliably detects a NaN and that x!=x when x is a NaN.  Those are the only assured tools in the uphill battle to fight the weird intrinsic nature of NaNs.

So one possible solution is to replace all the NaNs with a canonical placeholder value that doesn't have undesired properties:

    {None if isnan(x) else x for x in arr}

That relies on guaranteed behaviors and is reasonably fast.  IMO that beats trying to reprogram float('NaN') to behave the opposite of how it was designed.

History
Date	User	Action	Args
2021-06-14 04:40:26	rhettinger	set	recipients: + rhettinger, tim.peters, mark.dickinson, serhiy.storchaka, miss-islington, realead, congma
2021-06-14 04:40:26	rhettinger	set	messageid: <1623645626.06.0.202953890235.issue43475@roundup.psfhosted.org>
2021-06-14 04:40:26	rhettinger	link	issue43475 messages
2021-06-14 04:40:25	rhettinger	create