This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: collections Counter handles nan strangely
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Adam.Davison, mark.dickinson, rhettinger, terry.reedy
Priority: normal Keywords:

Created on 2013-10-04 10:07 by Adam.Davison, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (5)
msg198938 - (view) Author: Adam Davison (Adam.Davison) Date: 2013-10-04 10:07
If you pass an array containing nan to collections.Counter, rather than counting the number of 'nan's it outputs "'nan': 1" n times into the dictionary. I appreciate using this on an array of floats is a bit of an unusual case but I don't think this is the expected behaviour based on the documentation.

To reproduce, try e.g.:
a = [1, 1, 1, 2, 'nan', 'nan', 'nan']
collections.Counter(map(float, a))

Based on the documentation I expected to see:
{1.0: 3, 2.0: 1, nan: 3}

But it actually returns:
{1.0: 3, 2.0: 1, nan: 1, nan: 1, nan: 1}

Presumably this relates to the fact that nan != nan. I'm not 100% sure if this is a bug or maybe just something that should be mentioned in the documentation... Certainly it's not what I wanted it to do :)

Thanks,

Adam
msg198939 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-10-04 10:51
> Presumably this relates to the fact that nan != nan.

Yep.  What you're seeing is pretty much expected behaviour, and it matches how NaNs behave with respect to containment in other Python contexts:

>>> x = float('nan')
>>> y = float('nan')
>>> s = {x}
>>> x in s
True
>>> y in s
False

There's a much-discussed compromise between object model sanity and respect for IEEE 754 here.  You can find the discussions on the mailing lists, but the summary is that this isn't going to change in a hurry.

One way you can work around this is to make sure you only have single NaN object (possibly referenced multiple times) in your list.  Then you get the behaviour that you're looking for:

>>> nan = float('nan')
>>> a = [1, 1, 2, nan, nan, nan]
>>> collections.Counter(a)
Counter({nan: 3, 1: 2, 2: 1})

By the way, when you say 'array of floats', do you mean a NumPy ndarray, a standard library array.array object, or a plain Python list?  The example you show is a list containing a mixture of ints and strings.

I suggest closing this as 'wont fix'.  Raymond?
msg198940 - (view) Author: Adam Davison (Adam.Davison) Date: 2013-10-04 10:58
Thanks for the quick response. I'm really using a pandas Series, which is effectively a numpy array behind the scenes as far as I understand, the example I pasted was just to illustrate the behaviour. So the nans are being produced elsewhere, I don't really have control over that step.

It seems like perhaps collections.Counter should handle nans as a special case. But I can appreciate the counter-arguments too.

Thanks,

Adam
msg198947 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-10-04 13:19
> perhaps collections.Counter should handle nans as a special case

I don't think that would be a good idea:  I'd rather that collections.Counter didn't special case NaNs in any way, but instead treated NaNs following the same (admittedly somewhat awkward) rules that all the other Python collections do---namely, for NaNs, containment effectively works by object identity.

> I'm really using a pandas Series

Okay, that makes sense.  It's a bit unfortunate that NumPy creates a new NaN object every time you read a NaN value out of an array, so that you get e.g.,

>>> from numpy import array, nan, isnan
>>> import numpy as np
>>> my_list = [1.2, 2.3, np.nan, np.nan]
>>> my_list[2] is my_list[3]
True
>>> my_array = np.array(my_list)
>>> my_array[2] is my_array[3]
False

Or even:

>>> my_array[2] is my_array[2]
False

I guess you're stuck with using Pandas functionality like `dropna` and `isnull` to deal with missing and non-missing values separately.
msg198980 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2013-10-05 03:09
This is the sort of issue that makes me think that there should be a single nan object (perhaps an instance of a Nan(float) subclass with a special __eq__ method ;-). Pending that, I agree with closing as "won't fix".
History
Date User Action Args
2022-04-11 14:57:51adminsetgithub: 63360
2013-10-05 04:26:23rhettingersetstatus: open -> closed
resolution: wont fix
2013-10-05 03:09:13terry.reedysetnosy: + terry.reedy
messages: + msg198980
2013-10-04 13:19:41mark.dickinsonsetmessages: + msg198947
2013-10-04 10:58:12Adam.Davisonsetmessages: + msg198940
2013-10-04 10:51:01mark.dickinsonsetnosy: + rhettinger, mark.dickinson
messages: + msg198939
2013-10-04 10:07:22Adam.Davisoncreate