msg244288 - (view) |
Author: Thomas Arildsen (thomas-arildsen) |
Date: 2015-05-28 08:32 |
When I run the attached example in Python 2.7.9, it succeeds. In Python 3.4, it fails as shown below. I use json 2.0.9 and numpy 1.9.2 with both versions of Python. Python and all packages provided by Anaconda 2.2.0.
The error seems to be caused by the serialised object containing a numpy.int64 type. It might fail with other 64-bit numpy types as well (untested).
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/tha/tmp/debug_json/debug_json.py in <module>()
4 test = {'value': np.int64(1)}
5
----> 6 obj=json.dumps(test)
/home/tha/.conda/envs/python3/lib/python3.4/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
228 cls is None and indent is None and separators is None and
229 default is None and not sort_keys and not kw):
--> 230 return _default_encoder.encode(obj)
231 if cls is None:
232 cls = JSONEncoder
/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in encode(self, o)
190 # exceptions aren't as detailed. The list call should be roughly
191 # equivalent to the PySequence_Fast that ''.join() would do.
--> 192 chunks = self.iterencode(o, _one_shot=True)
193 if not isinstance(chunks, (list, tuple)):
194 chunks = list(chunks)
/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in iterencode(self, o, _one_shot)
248 self.key_separator, self.item_separator, self.sort_keys,
249 self.skipkeys, _one_shot)
--> 250 return _iterencode(o, 0)
251
252 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
/home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in default(self, o)
171
172 """
--> 173 raise TypeError(repr(o) + " is not JSON serializable")
174
175 def encode(self, o):
TypeError: 1 is not JSON serializable
|
msg244321 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-05-28 16:54 |
All python3 ints are what used to be long ints in python2, so the code that recognized short ints no longer exists. Do the numpy types implement __index__? It looks like json doesn't check for __index__, and I wonder if it should.
|
msg244352 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-05-28 23:10 |
> It looks like json doesn't check for __index__, and I wonder if it should.
I don't know. Simply, under 2.7, int64 inherits from int:
>>> np.int64.__mro__
(<type 'numpy.int64'>, <type 'numpy.signedinteger'>, <type 'numpy.integer'>, <type 'numpy.number'>, <type 'numpy.generic'>, <type 'int'>, <type 'object'>)
while it doesn't under 3.x:
>>> np.int64.__mro__
(<class 'numpy.int64'>, <class 'numpy.signedinteger'>, <class 'numpy.integer'>, <class 'numpy.number'>, <class 'numpy.generic'>, <class 'object'>)
|
msg244355 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-05-29 01:13 |
Ah, so this is a numpy bug?
|
msg244359 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2015-05-29 05:46 |
Yes, it looks as a bug (or rather lack of feature) in numpy, but numpy have no chance to fix it without help from Python. The json module is not flexible enough.
For now this issue can be workarounded only from user side, with special default handler.
>>> import numpy, json
>>> def default(o):
... if isinstance(o, numpy.integer): return int(o)
... raise TypeError
...
>>> json.dumps({'value': numpy.int64(42)}, default=default)
'{"value": 42}'
|
msg244363 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-05-29 09:40 |
I wouldn't call it a bug in Numpy (a quirk perhaps?). Numpy ints are fixed-width ints, so some of them can inherit from Python int in 2.x, but not in 3.x.
But not all of them do, since the bitwidth can be different:
>>> issubclass(np.int64, int)
True
>>> issubclass(np.int32, int)
False
>>> issubclass(np.int16, int)
False
|
msg244370 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2015-05-29 11:59 |
So in python2, some were json serializable and some weren't? Yes, I'd call that a quirk :)
So back to the question of whether it makes sense for json to look for __index__ to decide if something can be serialized as an int. If not, I don't think there is anything we can do.
|
msg244371 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2015-05-29 12:01 |
I don't know about __index__, but there's the ages-old discussion of allowing some kind of __json__ hook on types. Of course, none of those solutions would allow round-tripping.
|
msg254734 - (view) |
Author: Eli_B (Eli_B) * |
Date: 2015-11-16 14:29 |
On 64-bit Windows, my 64-bit Python 2.7.9 and my 32-bit 2.7.10 Python both reproduce the failure with a similar traceback.
|
msg257451 - (view) |
Author: Thomas Arildsen (thomas-arildsen) |
Date: 2016-01-04 10:14 |
Is there any possibility that json could implement special handling of NumPy types? This "lack of a feature" seems to have propagated back into Python 2.7 now in some recent update...
|
msg257455 - (view) |
Author: Nathaniel Smith (njs) * |
Date: 2016-01-04 11:20 |
Nothing's changed in python 2.7. Basically: (a) no numpy ints have ever serialized in py3. (b) in py2, either np.int32 *xor* np.int64 will serialize correctly, and which one it is depends on sizeof(long) in the C compiler used to build Python. (This follows from the fact that in py2, the Python 'int' type is always the same size as C 'long'.)
So the end result is: on OS X and Linux, 32-bit Pythons can JSON-serialize np.int32 objects, and 64-bit Pythons can JSON-serialize np.int64 objects, because 64-bit OS X and Linux is ILP64. On Windows, both 32- and 64-bit Pythons can JSON-serialize np.int32 objects, and can't serialize np.int64 objects, because 64-bit Windows is LLP64.
|
msg257459 - (view) |
Author: Thomas Arildsen (thomas-arildsen) |
Date: 2016-01-04 11:44 |
Thanks for the clarification.
|
msg350567 - (view) |
Author: Vicki Brown (vlbrown) |
Date: 2019-08-26 20:49 |
This is still broken. With pandas being popular, it's more likely someone might hit it. Can we fix this?
At the very least, the error message needs to be made much more specific.
I have created a dictionary containing pandas stats.
```
def summary_stats(s):
"""
Calculate summary statistics for a series or list, s
returns a dictionary
"""
stats = {
'count': 0,
'max': 0,
'min': 0,
'mean': 0,
'median': 0,
'mode': 0,
'std': 0,
'z': (0,0)
}
stats['count'] = s.count()
stats['max'] = s.max()
stats['min'] = s.min()
stats['mean'] = round(s.mean(),3)
stats['median'] = s.median()
stats['mode'] = s.mode()[0]
stats['std'] = round(s.std(),3)
std3 = 3* stats['std']
low_z = round(stats['mean'] - (std3),3)
high_z = round(stats['mean'] + (std3),3)
stats['z'] = (low_z, high_z)
return(stats)
```
Apparently, pandas (sometimes) returns numpy ints and numpy floats.
Here's a piece of the dictionary:
```
{'count': 597,
'max': 0.95,
'min': 0.01,
'mean': 0.585,
'median': 0.58,
'mode': 0.59,
'std': 0.122,
'z': (0.219, 0.951)}
````
It looks fine, but when I try to dump the dict to json
```
with open('Data/station_stats.json', 'w') as fp:
json.dump(station_stats, fp)
```
I get this error
```
TypeError: Object of type int64 is not JSON serializable
```
**Much searching** led me to discover that I apparently have numpy ints which I have confirmed.
```
for key, value in station_stats['657']['Fluorescence'].items():
print(key, value, type(value))
count 3183 <class 'numpy.int64'>
max 2.8 <class 'float'>
min 0.02 <class 'float'>
mean 0.323 <class 'float'>
median 0.28 <class 'float'>
mode 0.24 <class 'numpy.float64'>
std 0.194 <class 'float'>
z (-0.259, 0.905) <class 'tuple'>
```
```
#### Problem description
pandas statistics sometimes produce numpy numerics.
numpy ints are not supported by json.dump
#### Expected Output
I expect ints, floats, strings, ... to be JSON srializable.
<details>
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.3.final.0
python-bits : 64
OS : Darwin
OS-release : 15.6.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.0
numpy : 1.16.4
pytz : 2019.1
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : 0.29.12
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.3.4
html5lib : 1.0.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.7.0
pandas_datareader: None
bs4 : 4.7.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.3.4
matplotlib : 3.1.0
numexpr : 2.6.9
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.0
sqlalchemy : 1.3.5
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
</details>
|
msg350581 - (view) |
Author: Vicki Brown (vlbrown) |
Date: 2019-08-26 22:51 |
Note also that pandas DataFrame.to_json() method has no issue with int64. Perhaps you could borrow their code.
|
msg355133 - (view) |
Author: Batuhan Taskaya (BTaskaya) * |
Date: 2019-10-22 15:07 |
What is the next step of this 4-year-old issue? I think i can prepare a patch for using __index__ (as suggested by @r.david.murray)
|
msg355143 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2019-10-22 17:46 |
We could use __index__ for serializing numpy.int64. But what to do with numpy.float32 and numpy.float128? It is a part of a much larger problem (which includes other numbers, collections, encoded strings, named tuples and data classes, etc). I am working on it, but there is a lot of work.
|
msg413869 - (view) |
Author: Nikolay Markov (mxposed) |
Date: 2022-02-24 00:38 |
Just ran into this. Are there any updates? Is there any task to contribute to regarding this?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:17 | admin | set | github: 68501 |
2022-02-24 00:38:48 | mxposed | set | nosy:
+ mxposed messages:
+ msg413869
|
2020-01-08 12:31:36 | xtreak | link | issue39258 superseder |
2019-10-22 17:46:42 | serhiy.storchaka | set | messages:
+ msg355143 |
2019-10-22 15:07:47 | BTaskaya | set | nosy:
+ BTaskaya messages:
+ msg355133
|
2019-08-26 22:51:01 | vlbrown | set | messages:
+ msg350581 |
2019-08-26 20:49:43 | vlbrown | set | versions:
+ Python 3.7 nosy:
+ vlbrown
messages:
+ msg350567
type: enhancement -> behavior |
2016-01-04 11:44:21 | thomas-arildsen | set | messages:
+ msg257459 versions:
- Python 3.6 |
2016-01-04 11:20:23 | njs | set | messages:
+ msg257455 |
2016-01-04 10:14:57 | thomas-arildsen | set | messages:
+ msg257451 |
2015-11-16 14:29:45 | Eli_B | set | messages:
+ msg254734 |
2015-11-16 12:26:26 | Eli_B | set | nosy:
+ Eli_B
|
2015-11-16 12:02:31 | Amit Feller | set | nosy:
+ Amit Feller
|
2015-05-29 12:01:56 | pitrou | set | messages:
+ msg244371 |
2015-05-29 11:59:03 | r.david.murray | set | messages:
+ msg244370 |
2015-05-29 09:41:43 | pitrou | set | nosy:
+ njs
|
2015-05-29 09:40:57 | pitrou | set | messages:
+ msg244363 |
2015-05-29 05:46:35 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg244359
|
2015-05-29 01:13:57 | r.david.murray | set | messages:
+ msg244355 |
2015-05-28 23:10:57 | pitrou | set | versions:
+ Python 3.6, - Python 3.4 nosy:
+ pitrou
messages:
+ msg244352
type: crash -> enhancement |
2015-05-28 16:54:40 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg244321
|
2015-05-28 08:32:31 | thomas-arildsen | create | |