This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: json.dumps() should encode float number NaN to null
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: Haoyu SUN, alucab, arjanstaring, eric.smith, mark.dickinson, rhettinger
Priority: normal Keywords:

Created on 2020-05-15 13:42 by Haoyu SUN, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (21)
msg368942 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-15 13:42
Float numbers in Python can have 3 special number: nan, inf, -inf, which are encoded by json module as "NaN", "Infinity", "-Infinity". These representations are not compatible with JSON specifications RFC7159:
https://tools.ietf.org/html/rfc7159.html#page-6

These values are not correctly parsed by most JavaScript JSON encoders.

It is better to encode "NaN" to "null" which is a valid JSON keyword representing "Not a Number".

Here is an example how json.dumps() encodes NaN to NaN in JSON:
Python 3.6.9 (default, Apr 18 2020, 01:56:04)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> dct = {'a': None, 'b' : float('nan')}
>>> dct
{'a': None, 'b': nan}
>>> import json
>>> json.dumps(dct)
'{"a": null, "b": NaN}'
msg368944 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-15 13:57
Since this is documented behavior (https://docs.python.org/3.8/library/json.html#infinite-and-nan-number-values), we can't change it by default without breaking code.

What JavaScript JSON encoders and decoders specifically have a problem with this behavior? The documentation says "This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders.", so if there are encoders and decoders that it doesn't work with, that would be good to know.
msg368948 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-15 14:28
Thank you for the timely reply, Eric.

How about we add an optional argument (like the argument "ignore_nan" defaults to False as the package simplejson does) to functions like json.dumps(). So that user can choose whether he needs NaN encoded as NaN or null, meanwhile the default behavior stays the same.

In chromium based browsers, the function JSON.parse cannot parse it correctly. Here is an example below:
> JSON.parse('{"a": null, "b": NaN}')
uncaught SyntaxError: Unexpected token N in JSON at position 17
    at JSON.parse (<anonymous>)
    at <anonymous>:1:6
msg368950 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-15 14:55
I think that's reasonable, although I could see someone objecting ("just use simplejson instead").

I suggest discussing this on the python-ideas mailing list and see what people think over there. It might help to create a PR first, if it's not a lot of work.
msg368959 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-15 16:54
I don't think "null" in JSON is supposed to represent "Not a Number"; it's closer in meaning to Python's `None`. I definitely wouldn't want to see nans translated to "null" by default.

This also only seems to address a part of the issue: what's the proposed action for "Infinity" and "-Infinity"? We've written internal code to deal with float special values in JSON a few times (usually to work with databases that stick to the strict JSON definition), and that code has to find a way to deal with all three of the special values.
msg369005 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-16 03:36
[Eric]
> this is documented behavior

[Mark]
> I definitely wouldn't want to see nans translated to 
> "null" by default.

I concur with both of these statements.

I would support adding an option (off by default) to convert NaNs to None.  While NaNs were originally intended to indicate an invalid value, they sometimes get used to denote missing values.  In those situations, it would be reasonable to convert NaN to null.
msg369006 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-16 03:43
One other issue just came to mind.  While we could convert NaN to null during encoding, there isn't a reasonable way to reverse the process (a null could either be a NaN or a legitimate None).  That would limit the utility of a new optional conversion.
msg369043 - (view) Author: Haoyu SUN (Haoyu SUN) Date: 2020-05-16 13:13
About using null in JSON to represnet NaN value of a float type, I prefer this logic: float is a numeric type that expecting a number as its value, "Not a Number" on a numeric type is equivalent to None (¬Number ∩ NumericValues = Empty). If we need to capture an error in calculation or input data, we can use the allow_nan option to catch it. Database connectors such as SQLAlchemy translate an empty field as float('nan') for a float number field. Probably we can safely take it as a convention. No idea yet for representing infinity.

Once encoded, there is no way to know a null originates from NaN or None without additional fields.

The direct conversion from Python data types to JSON may lose part of information due to JSON's limited data types. When converting a BMP image to GIF, we have to eliminate some colors to fit in the small pallet and we do not expect to restore the full information BMP image has from its GIF counterpart.

I suggest we make the JSON module have at least an option to generate 
 standard-compliant JSON regardless potential loss of information, instead of leaving each application to have its subclass of JSONEncoder just for this corner case.
msg369044 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-16 13:20
I don't think we want to generate output no matter what. Should datetime instances become null instead of raising an exception?

Are there types other than float where some values are json serializable and others aren't?
msg369172 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-18 03:58
We could add an option to cause NaNs to raise an error, but I don't think it would get used.

Otherwise, it's likely best to leave the module as-is.
msg369188 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 06:48
> We could add an option to cause NaNs to raise an error, but I don't think it would get used.

If that option were extended to also cause infinities to raise an error, then I'd use it. We have code that's producing JSON without knowing in advance exactly who the JSON consumer will be, and in particular whether the consumer will be strict in what it accepts or not. In that situation, it's preferable for us to discover that we're producing invalid JSON early (e.g., when running our own unit tests) rather than much later, when it turns out that the customer is using the "wrong" relational database.
msg369189 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 06:52
... but I'm an idiot, since that option is already there (allow_nan=False), and I've just checked that we are in fact using it.
msg369190 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-05-18 06:55
I missed that as well ;-)

Shall we close this now?
msg369191 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-05-18 07:01
I think it should be closed.
msg369194 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-05-18 08:08
Agreed; closing.
msg381481 - (view) Author: Arjan Staring (arjanstaring) Date: 2020-11-20 14:11
Please re-evaluate; the current behaviour is incompatible with JSON specification in favour of providing the user/application/consumer of the resulted JSON information regarding the conversion process. Given what is stated in the documentation I do agree with the default behaviour, but I don't agree with only supporting "most JavaScript based encoders and decoders" and not supporting the JSON specification. I would opt to support "most encoders and decoders" + the JSON specification. Furthermore, the allow_nan doesn't allow anything, it forces as no alternative is provided. Setting it to false does not make it disallow, but makes it not work at all, forcing to use the default behaviour. I would suggest when allow_nan is set to false, to make it compliant with JSON and use null instead (as per specification). This way we are supporting most Javascript based encoders and decoders, but can also produce JSON compliant output.
msg381491 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-11-20 18:06
@Arjan Staring: could you point to which part of the JSON specification you're looking at?

At https://tools.ietf.org/html/rfc7159, the only reference to NaNs that I see is:

> Numeric values that cannot be represented in the grammar below (such
> as Infinity and NaN) are not permitted.

At https://www.json.org/json-en.html, there's no mention of IEEE 754 special values.

I'm not seeing anything anywhere to suggest that the JSON specification says NaNs should be translated to nulls.
msg383986 - (view) Author: Luca Barba (alucab) Date: 2020-12-29 12:16
I agree with arjanstaring

This implementation is not standard compliant and breaks interoperability with every ECMA compliant Javascript deserializer.

Technically is awful of course but interoperability and standardization come before than technical cleanliness IMHO

Regarding standardization:

If you consider https://tools.ietf.org/html/rfc7159

there is no way to represent the literal "nan" with the grammar supplied in section 6 hence the Infinity and Nan values are forbidden so as "nan"

For interoperability 

If you consider http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf

It is clearly stated in section 24.5.2 Note 4 that JSON.stringify produces null for Infinity and NaN

"Finite numbers are stringified as if by calling ToString(number). NaN and Infinity regardless of sign are represented as the String null"

It is clearly stated in section 24.5.1 that JSON.parse uses eval-like parsing as a reference for decoding. nan is not an allowed keyword at all. For interoperability NaN could be used but out from the JSON standard.

So what happens is that this will break all the ECMA compliant parsers (aka browsers) in the world. Which is what is happening to my project by the way

Pandas serialization methos (to_json) already adjusts this issue, but I really think the standard should too
msg384055 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-12-30 11:20
@Luca: you might want to open a new feature request issue; it's not clear to me what exact behaviour change you're proposing for Python.

What was rejected in this issue was the proposal to *automatically* convert NaNs and infinities to nulls by default, but that still leaves open the possibility of adding an option to do such conversion, provided that a sufficiently strong case could be made for adding such an option, and that we can figure out what we want the behaviour should be (should _all_ things that JSON doesn't know how to encode be converted to null, or just infinities and nans?)

If you want standards compliance, then that's already there: you can use the existing flag allow_nan=False when generating JSON. I agree that it would have been better if that were the default, but changing it now is probably a no-go - it would break too much existing code.

I'm still confused by Arjan Staring's comments: they seem to be saying that the JSON specification states that a NaN should be converted to the string "null", but there's nothing in RFC 7159 to support that - as you point out, it explicitly says that NaNs and infinities are disallowed.
msg384059 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-12-30 12:05
For the record, some helpful resources:

ECMA-404 (the ECMA standardization of JSON): http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf

RFC 8259 (current RFC for JSON): https://tools.ietf.org/html/rfc7159. (I mistakenly referred to RFC 7159 in a previous comment, but that's obsoleted by RFC 8259; however, none of the language around infinities and nans has changed, and none of the current errata to RFC 8259 have any impact on infinity or nan encoding.)

This Stack Overflow question and its answers contain some interesting discussion and links: https://stackoverflow.com/questions/1423081/json-left-out-infinity-and-nan-json-status-in-ecmascript 

Essentially, there's no good answer here: standard JSON simply can't encode infinities and NaNs. Absent a fix for the standard itself, both Python and ECMAScript end up papering over that fact. Unfortunately from an interoperability point of view, they do so in different ways - Python effectively extends the JSON spec in such a way that it produces invalid JSON by default; ECMAScript converts all of Infinity, -Infinity, NaN and null to the exact same JSON string, producing valid JSON but losing the ability to restore the original values from their JSON representations.

FWIW, Python's solution to this problem is (whether by accident or design I'm not sure) forward-looking in the sense that it's compatible with JSON 5: https://spec.json5.org
msg384060 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2020-12-30 12:06
> RFC 8259 (current RFC for JSON): https://tools.ietf.org/html/rfc7159

Argh; copy-and-paste fail. That link should have been https://tools.ietf.org/html/rfc8259, of course.
History
Date User Action Args
2022-04-11 14:59:31adminsetgithub: 84813
2020-12-30 12:06:37mark.dickinsonsetmessages: + msg384060
2020-12-30 12:05:30mark.dickinsonsetmessages: + msg384059
2020-12-30 11:20:51mark.dickinsonsetmessages: + msg384055
2020-12-29 12:16:12alucabsetnosy: + alucab
messages: + msg383986
2020-11-20 18:06:35mark.dickinsonsetmessages: + msg381491
2020-11-20 14:11:11arjanstaringsetnosy: + arjanstaring
messages: + msg381481
2020-05-18 08:08:12mark.dickinsonsetstatus: open -> closed
resolution: rejected
messages: + msg369194

stage: resolved
2020-05-18 07:01:26eric.smithsetmessages: + msg369191
2020-05-18 06:55:38rhettingersetmessages: + msg369190
2020-05-18 06:52:37mark.dickinsonsetmessages: + msg369189
2020-05-18 06:48:02mark.dickinsonsetmessages: + msg369188
2020-05-18 03:58:37rhettingersetmessages: + msg369172
2020-05-16 13:20:06eric.smithsetmessages: + msg369044
2020-05-16 13:13:35Haoyu SUNsetmessages: + msg369043
2020-05-16 03:43:49rhettingersetmessages: + msg369006
2020-05-16 03:36:21rhettingersetnosy: + rhettinger
messages: + msg369005
2020-05-15 16:54:35mark.dickinsonsetnosy: + mark.dickinson
messages: + msg368959
2020-05-15 14:55:04eric.smithsettype: behavior -> enhancement
messages: + msg368950
versions: + Python 3.9, - Python 3.6
2020-05-15 14:28:44Haoyu SUNsetmessages: + msg368948
2020-05-15 13:57:20eric.smithsetnosy: + eric.smith
messages: + msg368944
2020-05-15 13:42:04Haoyu SUNcreate