classification
Title: json module won't parse a float that starts with a decimal point
Type: behavior Stage: needs patch
Components: Documentation, Tests Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: pending Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: cvrebert, docs@python, ezio.melotti, holdenweb, jcea, mark.dickinson, nedbat, picomancer, pitrou, serhiy.storchaka, terry.reedy, tim.peters, vajrasky
Priority: normal Keywords: easy, patch

Created on 2013-12-03 08:21 by picomancer, last changed 2017-03-07 15:51 by serhiy.storchaka.

Files
File name Uploaded Description Edit
fix_doc_parse_non_valid_json_float.patch vajrasky, 2013-12-16 08:42 review
parse_non_valid_json_float_with_unit_test.patch vajrasky, 2013-12-16 10:44 Doc fix and unit test review
parse_non_valid_json_float_with_unit_test_v2.patch vajrasky, 2014-01-20 04:21 Doc fix and unit test, updated after Ezio's review. review
Messages (17)
msg205080 - (view) Author: (picomancer) Date: 2013-12-03 08:21
Try the following in your favorite Python version:

    import json
    json.loads(".5")

On my Python (2.7.4 and 3.3.1 on Ubuntu Saucy Salamander), I get an exception.  However, x = .5 is a valid Python number.

With respect to the parsing of floats by the json module, the docs state:

    By default, this is equivalent to ``float(num_str)``.

This statement does not match the behavior I have observed in every version of Python I have tried, and is still in the bleeding-edge (as of this writing) at http://hg.python.org/cpython/file/9283a9c5d0ce/Doc/library/json.rst

I think it's clear that the following changes should definitely be implemented:

(1) The docs and behavior should match
(2) Whatever the desired behavior is, there is a unit test specifically for this corner case

Of course, to implement (1), there are two routes:

(1a) Leading decimal floats should be accepted by the json module; the behavior should be changed to match the docs.  Supported by Postel's Law -- "be liberal in what [your program] accept[s]"), see http://en.wikipedia.org/wiki/Postel%27s_law and the slightly relaxed attitude toward standards compliance detailed in the json module documentation.

(1b) Leading decimal floats should be rejected by the json module; the docs should be changed to match the behavior.  This fits with a strict standards compliance worldview.

I think (1a) is better.  In my particular use case, I was manually writing a json file with several numerical parameters.  The backtrace given by json.load(open("whatever.json", "r")) is uninformative and merely says "No JSON object could be decoded"; finding the token the parser considered to be malformed was fairly easy since there were only six or seven keys.  It could have been much worse if I was making manual changes to a larger JSON file.
msg205083 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-12-03 08:34
I think it would be better to adhere to the JSON spec, which doesn't allow numbers to start with a decimal point:
http://json.org/

If we go this way, the documentation should at least be fixed; and, as you say, we could also add a unit test for it.
msg205091 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-03 09:18
Agree with Antoine.
msg205114 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-12-03 12:32
In context, the doc is correct:

"""
parse_float, if specified, will be called with the string of every JSON float to be decoded. By default, this is equivalent to float(num_str).
"""

IIUC, parse_float only comes into play once the JSON source has already been tokenized, and the tokenization stage has already rejected things like '.5' by that point.  (The point of parse_float is that you can choose to turn numeric strings into decimal.Decimal instances instead of floats if you so wish.)

I agree it could use clarification.
msg205119 - (view) Author: Ned Batchelder (nedbat) * (Python triager) Date: 2013-12-03 14:41
There are other forms of numbers allowed by Python that are not allowed by JSON: "001.1"

Oddly, with all of the strictness in JSON, the exponent-marker "e" can be upper- or lower-case:  1e1 and 1E1 are both valid JSON.
msg205242 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2013-12-04 20:12
> Oddly, with all of the strictness in JSON, the exponent-marker "e"
> can be upper- or lower-case

I'd guess that the aim is that common floating-point output formats from a variety of languages are valid JSON.  That would also explain why both '+' and '-' are allowed on the exponent, but only '-' on the significand, and why leading zeros are permitted on the exponent but not the significand.
msg205246 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2013-12-04 21:00
We should adhere to the json spec, but there's no harm (and some real good!) in the docs pointing out notable cases where json and Python syntax differ.
msg205247 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-12-04 21:09
There are too many cases where json and Python syntax differ. Final comma("[1, 2,]"), non-string keys ({1: 2}), tuples ("(1, 2)"), leading zeros ("0001"), hexadecimal integers ("0xaf"), escapes of astral characters('"\U0001d504"'), single quotes ("'spam'"), octal escape codes ("\015"), etc, etc...
msg206280 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-16 08:42
How about this doc fix?
msg206292 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-12-16 10:44
Okay, I added unit test for this edge case.
msg208519 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2014-01-20 04:21
Attached the patch to address Ezio Melotti's concern. Thanks for the review!
msg210807 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2014-02-10 08:10
I missed this comment from Serhiy:

> There are too many cases where json and Python syntax differ.

How many differences there are between the two?
I think we might add notes to the table at http://docs.python.org/3/library/json.html#encoders-and-decoders (either something similar to the notes in the table at http://docs.python.org/3/library/stdtypes.html#mutable-sequence-types, or just a couple of lines in a third column).  If there are too many differences and we follow the specs we can just add a note saying that the decoding is done according to the specs, add a link to them, and mention a few examples that are valid in Python but not in JSON (like Serhiy did in his message).
msg210812 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-10 09:05
My point is that Python and JSON are two different languages which have different syntaxes. JSON is not Python and Python is not JSON. We can't enumerate all differences unless to cite Python [1] and JSON [2] specifications.

For differences from JSON specifications see [3].

[1] http://docs.python.org/3/reference/index.html
[2] http://tools.ietf.org/html/rfc4627.html
[3] http://docs.python.org/3/library/json.html#standard-compliance
msg210817 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-10 10:20
To me, the JSONDecoder doc at the top of this section
http://docs.python.org/3/library/json.html#encoders-and-decoders
is unclear about decoding process and timing of application and the signature of all 5 hook functions. The timing and signature issues are related. My questions and suggestions:

- 'Simple JSON decoder.'
+ 'A simple JSON decoder that splits JSON strings into JSON substrings that represent programming objects and then converts the substrings into Python objects.'

I took the above from Mark's description, using 'substring' instead of 'token'.

- '\nPerforms the following translations in decoding by default:'
+ 'The default translations from JSON object strings to Python objects are as follows:'

Should the table have multiple notes?  The intended use case for JSON is program to program communication, where both programs 'understand' the legal JSON syntax. Is handcrafting JSON strings, without knowing the syntax, also an intended usecase?  I think not. Teaching the syntax is out of scope for the docs. However, the table could be followed by a generic note that JSON syntax and Python syntax for objects are different.

+ "The legal JSON syntax for various classes is different from Python syntax for the same classes. The tranformations above are only  applied to legal JSON strings. For instance, both float('.5') and float(0.5) are legal Python code, but a JSON encoder will only produce '0.5', so a JSON decoder will reject '.5' as an error and not pass it on to float() or its parse_float substitute."

This suggestion is intended to replace class by class notes or the proposed addition to the parse_float entry.

I think the description of the parse hooks would be clearer if the input signature were given immediately with the name.

object_hook: I do not understand enough to suggest anything. Is the input a json string representing an object or a Python dict?

object_pairs_hook: Ditto. What is its input?

parse_int: This should come before parse_float, to match the table.

- '*parse_int*, if specified ...'
+ '*parse_int*(*num_str*), if specified ...'

'num_str' as the parameter name is used in the ... part.

parse_float(num_str): ditto

parse_constant(const_str): I do not understand "This can be used to raise an exception if invalid JSON numbers are encountered."

>>> def f(s): raise ValueError('custom')
...
>>> json.loads('.5', parse_constant=f)

gives same error as as without parse_constant. The sentence should be rewritten or removed.
msg210819 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-02-10 10:27
> My point is that Python and JSON are two different languages which have different syntaxes.

That is also the point of the generic note I suggested, which explains the consequence of the difference (rejection by JSON parser before calling Python constructor), with just one example given.
msg210825 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-10 11:21
> - 'Simple JSON decoder.'
> + 'A simple JSON decoder that splits JSON strings into JSON substrings
> that represent programming objects and then converts the substrings
> into Python objects.'

Please let's keep the description simple. Everyone is able to understand
what a JSON decoder is, and your suggested change is strangely confusing
("programming objects"?).
msg215087 - (view) Author: Steve Holden (holdenweb) * (Python committer) Date: 2014-03-28 22:47
How about: "A simple JSON decoder that converts between JSON string representations and Python data structures"?
History
Date User Action Args
2017-03-07 15:51:01serhiy.storchakasetstatus: open -> pending
2014-03-28 22:47:30holdenwebsetnosy: + holdenweb
messages: + msg215087
2014-03-24 20:57:21cvrebertsetnosy: + cvrebert
2014-02-10 11:21:52pitrousetmessages: + msg210825
2014-02-10 10:27:39terry.reedysetmessages: + msg210819
2014-02-10 10:20:13terry.reedysetmessages: + msg210817
2014-02-10 09:05:41serhiy.storchakasetmessages: + msg210812
2014-02-10 08:10:04ezio.melottisetkeywords: + easy
nosy: + terry.reedy
messages: + msg210807

2014-01-20 04:21:39vajraskysetfiles: + parse_non_valid_json_float_with_unit_test_v2.patch

messages: + msg208519
2013-12-16 10:44:22vajraskysetfiles: + parse_non_valid_json_float_with_unit_test.patch

messages: + msg206292
2013-12-16 08:42:52vajraskysetfiles: + fix_doc_parse_non_valid_json_float.patch

nosy: + vajrasky
messages: + msg206280

keywords: + patch
2013-12-04 21:09:27serhiy.storchakasetmessages: + msg205247
2013-12-04 21:00:35tim.peterssetnosy: + tim.peters
messages: + msg205246
2013-12-04 20:20:56ezio.melottisetnosy: + ezio.melotti
2013-12-04 20:12:02mark.dickinsonsetmessages: + msg205242
2013-12-03 17:37:27jceasetnosy: + jcea
2013-12-03 14:41:19nedbatsetnosy: + nedbat
messages: + msg205119
2013-12-03 12:32:59mark.dickinsonsetmessages: + msg205114
2013-12-03 09:18:23serhiy.storchakasetmessages: + msg205091
2013-12-03 08:35:26mark.dickinsonsetnosy: + mark.dickinson
2013-12-03 08:34:53pitrousetnosy: + serhiy.storchaka
2013-12-03 08:34:41pitrousetassignee: docs@python
components: + Documentation, Tests, - Library (Lib)
versions: - Python 3.5
nosy: + docs@python, pitrou

messages: + msg205083
stage: needs patch
2013-12-03 08:21:12picomancercreate