classification
Title: json fails to serialise numpy.int64
Type: behavior Stage:
Components: Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Amit Feller, BTaskaya, Eli_B, njs, pitrou, r.david.murray, serhiy.storchaka, thomas-arildsen, vlbrown
Priority: normal Keywords:

Created on 2015-05-28 08:32 by thomas-arildsen, last changed 2019-10-22 17:46 by serhiy.storchaka.

Files
File name Uploaded Description Edit
debug_json.py thomas-arildsen, 2015-05-28 08:32 Minimal example to demonstrate the problem
Messages (16)
msg244288 - (view) Author: Thomas Arildsen (thomas-arildsen) Date: 2015-05-28 08:32
When I run the attached example in Python 2.7.9, it succeeds. In Python 3.4, it fails as shown below. I use json 2.0.9 and numpy 1.9.2 with both versions of Python. Python and all packages provided by Anaconda 2.2.0.
The error seems to be caused by the serialised object containing a numpy.int64 type. It might fail with other 64-bit numpy types as well (untested).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/home/tha/tmp/debug_json/debug_json.py in <module>()
      4 test = {'value': np.int64(1)}
      5 
----> 6 obj=json.dumps(test)

/home/tha/.conda/envs/python3/lib/python3.4/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    228         cls is None and indent is None and separators is None and
    229         default is None and not sort_keys and not kw):
--> 230         return _default_encoder.encode(obj)
    231     if cls is None:
    232         cls = JSONEncoder

/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in encode(self, o)
    190         # exceptions aren't as detailed.  The list call should be roughly
    191         # equivalent to the PySequence_Fast that ''.join() would do.
--> 192         chunks = self.iterencode(o, _one_shot=True)
    193         if not isinstance(chunks, (list, tuple)):
    194             chunks = list(chunks)

/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in iterencode(self, o, _one_shot)
    248                 self.key_separator, self.item_separator, self.sort_keys,
    249                 self.skipkeys, _one_shot)
--> 250         return _iterencode(o, 0)
    251 
    252 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in default(self, o)
    171 
    172         """
--> 173         raise TypeError(repr(o) + " is not JSON serializable")
    174 
    175     def encode(self, o):

TypeError: 1 is not JSON serializable
msg244321 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-28 16:54
All python3 ints are what used to be long ints in python2, so the code that recognized short ints no longer exists.  Do the numpy types implement __index__?  It looks like json doesn't check for __index__, and I wonder if it should.
msg244352 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-28 23:10
> It looks like json doesn't check for __index__, and I wonder if it should.

I don't know. Simply, under 2.7, int64 inherits from int:

>>> np.int64.__mro__
(<type 'numpy.int64'>, <type 'numpy.signedinteger'>, <type 'numpy.integer'>, <type 'numpy.number'>, <type 'numpy.generic'>, <type 'int'>, <type 'object'>)

while it doesn't under 3.x:

>>> np.int64.__mro__ 
(<class 'numpy.int64'>, <class 'numpy.signedinteger'>, <class 'numpy.integer'>, <class 'numpy.number'>, <class 'numpy.generic'>, <class 'object'>)
msg244355 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-29 01:13
Ah, so this is a numpy bug?
msg244359 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-29 05:46
Yes, it looks as a bug (or rather lack of feature) in numpy, but numpy have no chance to fix it without help from Python. The json module is not flexible enough.

For now this issue can be workarounded only from user side, with special default handler.

>>> import numpy, json
>>> def default(o):
...     if isinstance(o, numpy.integer): return int(o)
...     raise TypeError
... 
>>> json.dumps({'value': numpy.int64(42)}, default=default)
'{"value": 42}'
msg244363 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-29 09:40
I wouldn't call it a bug in Numpy (a quirk perhaps?). Numpy ints are fixed-width ints, so some of them can inherit from Python int in 2.x, but not in 3.x.
But not all of them do, since the bitwidth can be different:

>>> issubclass(np.int64, int)
True
>>> issubclass(np.int32, int)
False
>>> issubclass(np.int16, int)
False
msg244370 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-05-29 11:59
So in python2, some were json serializable and some weren't?  Yes, I'd call that a quirk :)

So back to the question of whether it makes sense for json to look for __index__ to decide if something can be serialized as an int.  If not, I don't think there is anything we can do.
msg244371 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-05-29 12:01
I don't know about __index__, but there's the ages-old discussion of allowing some kind of __json__ hook on types. Of course, none of those solutions would allow round-tripping.
msg254734 - (view) Author: Eli_B (Eli_B) * Date: 2015-11-16 14:29
On 64-bit Windows, my 64-bit Python 2.7.9 and my 32-bit 2.7.10 Python both reproduce the failure with a similar traceback.
msg257451 - (view) Author: Thomas Arildsen (thomas-arildsen) Date: 2016-01-04 10:14
Is there any possibility that json could implement special handling of NumPy types? This "lack of a feature" seems to have propagated back into Python 2.7 now in some recent update...
msg257455 - (view) Author: Nathaniel Smith (njs) * (Python committer) Date: 2016-01-04 11:20
Nothing's changed in python 2.7. Basically: (a) no numpy ints have ever serialized in py3. (b) in py2, either np.int32 *xor* np.int64 will serialize correctly, and which one it is depends on sizeof(long) in the C compiler used to build Python. (This follows from the fact that in py2, the Python 'int' type is always the same size as C 'long'.)

So the end result is: on OS X and Linux, 32-bit Pythons can JSON-serialize np.int32 objects, and 64-bit Pythons can JSON-serialize np.int64 objects, because 64-bit OS X and Linux is ILP64. On Windows, both 32- and 64-bit Pythons can JSON-serialize np.int32 objects, and can't serialize np.int64 objects, because 64-bit Windows is LLP64.
msg257459 - (view) Author: Thomas Arildsen (thomas-arildsen) Date: 2016-01-04 11:44
Thanks for the clarification.
msg350567 - (view) Author: Vicki Brown (vlbrown) Date: 2019-08-26 20:49
This is still broken. With pandas being popular, it's more likely someone might hit it. Can we fix this?

At the very least, the error message needs to be made much more specific.

 I have created a dictionary containing pandas stats. 
```
def summary_stats(s):
    """ 
    Calculate summary statistics for a series or list, s 
    returns a dictionary
    """
    
    stats = {
      'count': 0,
      'max': 0,
      'min': 0,
      'mean': 0,
      'median': 0,
      'mode': 0,
      'std': 0,
      'z': (0,0)
    }
    
    stats['count'] = s.count()
    stats['max'] = s.max()
    stats['min'] = s.min()
    stats['mean'] = round(s.mean(),3)
    stats['median'] = s.median()
    stats['mode'] = s.mode()[0]
    stats['std'] = round(s.std(),3)

    std3 = 3* stats['std']
    low_z = round(stats['mean'] - (std3),3)
    high_z = round(stats['mean'] + (std3),3)
    stats['z'] = (low_z, high_z)
        
    return(stats)
```

Apparently, pandas (sometimes) returns numpy ints and numpy floats. 

Here's a piece of the dictionary:
```
 {'count': 597,
   'max': 0.95,
   'min': 0.01,
   'mean': 0.585,
   'median': 0.58,
   'mode': 0.59,
   'std': 0.122,
   'z': (0.219, 0.951)}
````

It looks fine, but when I try to dump the dict to json
```
with open('Data/station_stats.json', 'w') as fp:
    json.dump(station_stats, fp)
```

I get this error
```
TypeError: Object of type int64 is not JSON serializable
```

**Much searching** led me to discover that I apparently have numpy ints which I have confirmed.

```
for key, value in station_stats['657']['Fluorescence'].items():
    print(key, value, type(value))

count 3183 <class 'numpy.int64'>
max 2.8 <class 'float'>
min 0.02 <class 'float'>
mean 0.323 <class 'float'>
median 0.28 <class 'float'>
mode 0.24 <class 'numpy.float64'>
std 0.194 <class 'float'>
z (-0.259, 0.905) <class 'tuple'>
```

```
#### Problem description

pandas statistics sometimes produce numpy numerics.

numpy ints are not supported by json.dump

#### Expected Output

I expect ints, floats, strings, ... to be JSON srializable.


<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.3.final.0
python-bits      : 64
OS               : Darwin
OS-release       : 15.6.0
machine          : x86_64
processor        : i386
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.0
numpy            : 1.16.4
pytz             : 2019.1
dateutil         : 2.8.0
pip              : 19.1.1
setuptools       : 41.0.1
Cython           : 0.29.12
pytest           : 5.0.1
hypothesis       : None
sphinx           : 2.1.2
blosc            : None
feather          : None
xlsxwriter       : 1.1.8
lxml.etree       : 4.3.4
html5lib         : 1.0.1
pymysql          : 0.9.3
psycopg2         : None
jinja2           : 2.10.1
IPython          : 7.7.0
pandas_datareader: None
bs4              : 4.7.1
bottleneck       : 1.2.1
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.3.4
matplotlib       : 3.1.0
numexpr          : 2.6.9
odfpy            : None
openpyxl         : 2.6.2
pandas_gbq       : None
pyarrow          : None
pytables         : None
s3fs             : None
scipy            : 1.3.0
sqlalchemy       : 1.3.5
tables           : 3.5.2
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.3.0
xlsxwriter       : 1.1.8
</details>
msg350581 - (view) Author: Vicki Brown (vlbrown) Date: 2019-08-26 22:51
Note also that pandas DataFrame.to_json() method has no issue with int64. Perhaps you could borrow their code.
msg355133 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2019-10-22 15:07
What is the next step of this 4-year-old issue? I think i can prepare a patch for using __index__ (as suggested by @r.david.murray)
msg355143 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-10-22 17:46
We could use __index__ for serializing numpy.int64. But what to do with numpy.float32 and numpy.float128? It is a part of a much larger problem (which includes other numbers, collections, encoded strings, named tuples and data classes, etc). I am working on it, but there is a lot of work.
History
Date User Action Args
2020-01-08 12:31:36xtreaklinkissue39258 superseder
2019-10-22 17:46:42serhiy.storchakasetmessages: + msg355143
2019-10-22 15:07:47BTaskayasetnosy: + BTaskaya
messages: + msg355133
2019-08-26 22:51:01vlbrownsetmessages: + msg350581
2019-08-26 20:49:43vlbrownsetversions: + Python 3.7
nosy: + vlbrown

messages: + msg350567

type: enhancement -> behavior
2016-01-04 11:44:21thomas-arildsensetmessages: + msg257459
versions: - Python 3.6
2016-01-04 11:20:23njssetmessages: + msg257455
2016-01-04 10:14:57thomas-arildsensetmessages: + msg257451
2015-11-16 14:29:45Eli_Bsetmessages: + msg254734
2015-11-16 12:26:26Eli_Bsetnosy: + Eli_B
2015-11-16 12:02:31Amit Fellersetnosy: + Amit Feller
2015-05-29 12:01:56pitrousetmessages: + msg244371
2015-05-29 11:59:03r.david.murraysetmessages: + msg244370
2015-05-29 09:41:43pitrousetnosy: + njs
2015-05-29 09:40:57pitrousetmessages: + msg244363
2015-05-29 05:46:35serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg244359
2015-05-29 01:13:57r.david.murraysetmessages: + msg244355
2015-05-28 23:10:57pitrousetversions: + Python 3.6, - Python 3.4
nosy: + pitrou

messages: + msg244352

type: crash -> enhancement
2015-05-28 16:54:40r.david.murraysetnosy: + r.david.murray
messages: + msg244321
2015-05-28 08:32:31thomas-arildsencreate