classification
Title: json.dumps() should encode float number NaN to null
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Haoyu SUN, eric.smith, mark.dickinson, rhettinger
Priority: normal Keywords:

Created on 2020-05-15 13:42 by Haoyu SUN, last changed 2020-05-18 08:08 by mark.dickinson. This issue is now closed.

Messages (15)
msg368942 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-15 13:42
Float numbers in Python can have 3 special number: nan, inf, -inf, which are encoded by json module as "NaN", "Infinity", "-Infinity". These representations are not compatible with JSON specifications RFC7159:
https://tools.ietf.org/html/rfc7159.html#page-6

These values are not correctly parsed by most JavaScript JSON encoders.

It is better to encode "NaN" to "null" which is a valid JSON keyword representing "Not a Number".

Here is an example how json.dumps() encodes NaN to NaN in JSON:
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> dct = {'a': None, 'b' : float('nan')}
>>> dct
{'a': None, 'b': nan}
>>> import json
>>> json.dumps(dct)
'{"a": null, "b": NaN}'
msg368944 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-15 13:57
Since this is documented behavior (https://docs.python.org/3.8/library/json.html#infinite-and-nan-number-values), we can't change it by default without breaking code.

What JavaScript JSON encoders and decoders specifically have a problem with this behavior? The documentation says "This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders.", so if there are encoders and decoders that it doesn't work with, that would be good to know.
msg368948 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-15 14:28
Thank you for the timely reply, Eric.

How about we add an optional argument (like the argument "ignore_nan" defaults to False as the package simplejson does) to functions like json.dumps(). So that user can choose whether he needs NaN encoded as NaN or null, meanwhile the default behavior stays the same.

In chromium based browsers, the function JSON.parse cannot parse it correctly. Here is an example below:
> JSON.parse('{"a": null, "b": NaN}')
uncaught SyntaxError: Unexpected token N in JSON at position 17
    at JSON.parse (<anonymous>)
    at <anonymous>:1:6
msg368950 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-15 14:55
I think that's reasonable, although I could see someone objecting ("just use simplejson instead").

I suggest discussing this on the python-ideas mailing list and see what people think over there. It might help to create a PR first, if it's not a lot of work.
msg368959 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-15 16:54
I don't think "null" in JSON is supposed to represent "Not a Number"; it's closer in meaning to Python's `None`. I definitely wouldn't want to see nans translated to "null" by default.

This also only seems to address a part of the issue: what's the proposed action for "Infinity" and "-Infinity"? We've written internal code to deal with float special values in JSON a few times (usually to work with databases that stick to the strict JSON definition), and that code has to find a way to deal with all three of the special values.
msg369005 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-16 03:36
[Eric]
> this is documented behavior

[Mark]
> I definitely wouldn't want to see nans translated to 
> "null" by default.

I concur with both of these statements.

I would support adding an option (off by default) to convert NaNs to None.  While NaNs were originally intended to indicate an invalid value, they sometimes get used to denote missing values.  In those situations, it would be reasonable to convert NaN to null.
msg369006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-16 03:43
One other issue just came to mind.  While we could convert NaN to null during encoding, there isn't a reasonable way to reverse the process (a null could either be a NaN or a legitimate None).  That would limit the utility of a new optional conversion.
msg369043 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-16 13:13
About using null in JSON to represnet NaN value of a float type, I prefer this logic: float is a numeric type that expecting a number as its value, "Not a Number" on a numeric type is equivalent to None (¬Number ∩ NumericValues = Empty). If we need to capture an error in calculation or input data, we can use the allow_nan option to catch it. Database connectors such as SQLAlchemy translate an empty field as float('nan') for a float number field. Probably we can safely take it as a convention. No idea yet for representing infinity.

Once encoded, there is no way to know a null originates from NaN or None without additional fields.

The direct conversion from Python data types to JSON may lose part of information due to JSON's limited data types. When converting a BMP image to GIF, we have to eliminate some colors to fit in the small pallet and we do not expect to restore the full information BMP image has from its GIF counterpart.

I suggest we make the JSON module have at least an option to generate 
 standard-compliant JSON regardless potential loss of information, instead of leaving each application to have its subclass of JSONEncoder just for this corner case.
msg369044 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-16 13:20
I don't think we want to generate output no matter what. Should datetime instances become null instead of raising an exception?

Are there types other than float where some values are json serializable and others aren't?
msg369172 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-18 03:58
We could add an option to cause NaNs to raise an error, but I don't think it would get used.

Otherwise, it's likely best to leave the module as-is.
msg369188 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 06:48
> We could add an option to cause NaNs to raise an error, but I don't think it would get used.

If that option were extended to also cause infinities to raise an error, then I'd use it. We have code that's producing JSON without knowing in advance exactly who the JSON consumer will be, and in particular whether the consumer will be strict in what it accepts or not. In that situation, it's preferable for us to discover that we're producing invalid JSON early (e.g., when running our own unit tests) rather than much later, when it turns out that the customer is using the "wrong" relational database.
msg369189 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 06:52
... but I'm an idiot, since that option is already there (allow_nan=False), and I've just checked that we are in fact using it.
msg369190 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-18 06:55
I missed that as well ;-)

Shall we close this now?
msg369191 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-18 07:01
I think it should be closed.
msg369194 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 08:08
Agreed; closing.
History
Date User Action Args
2020-05-18 08:08:12mark.dickinsonsetstatus: open -> closed
resolution: rejected
messages: + msg369194

stage: resolved
2020-05-18 07:01:26eric.smithsetmessages: + msg369191
2020-05-18 06:55:38rhettingersetmessages: + msg369190
2020-05-18 06:52:37mark.dickinsonsetmessages: + msg369189
2020-05-18 06:48:02mark.dickinsonsetmessages: + msg369188
2020-05-18 03:58:37rhettingersetmessages: + msg369172
2020-05-16 13:20:06eric.smithsetmessages: + msg369044
2020-05-16 13:13:35Haoyu SUNsetmessages: + msg369043
2020-05-16 03:43:49rhettingersetmessages: + msg369006
2020-05-16 03:36:21rhettingersetnosy: + rhettinger
messages: + msg369005
2020-05-15 16:54:35mark.dickinsonsetnosy: + mark.dickinson
messages: + msg368959
2020-05-15 14:55:04eric.smithsettype: behavior -> enhancement
messages: + msg368950
versions: + Python 3.9, - Python 3.6
2020-05-15 14:28:44Haoyu SUNsetmessages: + msg368948
2020-05-15 13:57:20eric.smithsetnosy: + eric.smith
messages: + msg368944
2020-05-15 13:42:04Haoyu SUNcreate