msg205080 - (view) |
Author: (picomancer) |
Date: 2013-12-03 08:21 |
Try the following in your favorite Python version:
import json
json.loads(".5")
On my Python (2.7.4 and 3.3.1 on Ubuntu Saucy Salamander), I get an exception. However, x = .5 is a valid Python number.
With respect to the parsing of floats by the json module, the docs state:
By default, this is equivalent to ``float(num_str)``.
This statement does not match the behavior I have observed in every version of Python I have tried, and is still in the bleeding-edge (as of this writing) at http://hg.python.org/cpython/file/9283a9c5d0ce/Doc/library/json.rst
I think it's clear that the following changes should definitely be implemented:
(1) The docs and behavior should match
(2) Whatever the desired behavior is, there is a unit test specifically for this corner case
Of course, to implement (1), there are two routes:
(1a) Leading decimal floats should be accepted by the json module; the behavior should be changed to match the docs. Supported by Postel's Law -- "be liberal in what [your program] accept[s]"), see http://en.wikipedia.org/wiki/Postel%27s_law and the slightly relaxed attitude toward standards compliance detailed in the json module documentation.
(1b) Leading decimal floats should be rejected by the json module; the docs should be changed to match the behavior. This fits with a strict standards compliance worldview.
I think (1a) is better. In my particular use case, I was manually writing a json file with several numerical parameters. The backtrace given by json.load(open("whatever.json", "r")) is uninformative and merely says "No JSON object could be decoded"; finding the token the parser considered to be malformed was fairly easy since there were only six or seven keys. It could have been much worse if I was making manual changes to a larger JSON file.
|
msg205083 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2013-12-03 08:34 |
I think it would be better to adhere to the JSON spec, which doesn't allow numbers to start with a decimal point:
http://json.org/
If we go this way, the documentation should at least be fixed; and, as you say, we could also add a unit test for it.
|
msg205091 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-03 09:18 |
Agree with Antoine.
|
msg205114 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-12-03 12:32 |
In context, the doc is correct:
"""
parse_float, if specified, will be called with the string of every JSON float to be decoded. By default, this is equivalent to float(num_str).
"""
IIUC, parse_float only comes into play once the JSON source has already been tokenized, and the tokenization stage has already rejected things like '.5' by that point. (The point of parse_float is that you can choose to turn numeric strings into decimal.Decimal instances instead of floats if you so wish.)
I agree it could use clarification.
|
msg205119 - (view) |
Author: Ned Batchelder (nedbat) * |
Date: 2013-12-03 14:41 |
There are other forms of numbers allowed by Python that are not allowed by JSON: "001.1"
Oddly, with all of the strictness in JSON, the exponent-marker "e" can be upper- or lower-case: 1e1 and 1E1 are both valid JSON.
|
msg205242 - (view) |
Author: Mark Dickinson (mark.dickinson) * |
Date: 2013-12-04 20:12 |
> Oddly, with all of the strictness in JSON, the exponent-marker "e"
> can be upper- or lower-case
I'd guess that the aim is that common floating-point output formats from a variety of languages are valid JSON. That would also explain why both '+' and '-' are allowed on the exponent, but only '-' on the significand, and why leading zeros are permitted on the exponent but not the significand.
|
msg205246 - (view) |
Author: Tim Peters (tim.peters) * |
Date: 2013-12-04 21:00 |
We should adhere to the json spec, but there's no harm (and some real good!) in the docs pointing out notable cases where json and Python syntax differ.
|
msg205247 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2013-12-04 21:09 |
There are too many cases where json and Python syntax differ. Final comma("[1, 2,]"), non-string keys ({1: 2}), tuples ("(1, 2)"), leading zeros ("0001"), hexadecimal integers ("0xaf"), escapes of astral characters('"\U0001d504"'), single quotes ("'spam'"), octal escape codes ("\015"), etc, etc...
|
msg206280 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-12-16 08:42 |
How about this doc fix?
|
msg206292 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2013-12-16 10:44 |
Okay, I added unit test for this edge case.
|
msg208519 - (view) |
Author: Vajrasky Kok (vajrasky) * |
Date: 2014-01-20 04:21 |
Attached the patch to address Ezio Melotti's concern. Thanks for the review!
|
msg210807 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2014-02-10 08:10 |
I missed this comment from Serhiy:
> There are too many cases where json and Python syntax differ.
How many differences there are between the two?
I think we might add notes to the table at http://docs.python.org/3/library/json.html#encoders-and-decoders (either something similar to the notes in the table at http://docs.python.org/3/library/stdtypes.html#mutable-sequence-types, or just a couple of lines in a third column). If there are too many differences and we follow the specs we can just add a note saying that the decoding is done according to the specs, add a link to them, and mention a few examples that are valid in Python but not in JSON (like Serhiy did in his message).
|
msg210812 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2014-02-10 09:05 |
My point is that Python and JSON are two different languages which have different syntaxes. JSON is not Python and Python is not JSON. We can't enumerate all differences unless to cite Python [1] and JSON [2] specifications.
For differences from JSON specifications see [3].
[1] http://docs.python.org/3/reference/index.html
[2] http://tools.ietf.org/html/rfc4627.html
[3] http://docs.python.org/3/library/json.html#standard-compliance
|
msg210817 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2014-02-10 10:20 |
To me, the JSONDecoder doc at the top of this section
http://docs.python.org/3/library/json.html#encoders-and-decoders
is unclear about decoding process and timing of application and the signature of all 5 hook functions. The timing and signature issues are related. My questions and suggestions:
- 'Simple JSON decoder.'
+ 'A simple JSON decoder that splits JSON strings into JSON substrings that represent programming objects and then converts the substrings into Python objects.'
I took the above from Mark's description, using 'substring' instead of 'token'.
- '\nPerforms the following translations in decoding by default:'
+ 'The default translations from JSON object strings to Python objects are as follows:'
Should the table have multiple notes? The intended use case for JSON is program to program communication, where both programs 'understand' the legal JSON syntax. Is handcrafting JSON strings, without knowing the syntax, also an intended usecase? I think not. Teaching the syntax is out of scope for the docs. However, the table could be followed by a generic note that JSON syntax and Python syntax for objects are different.
+ "The legal JSON syntax for various classes is different from Python syntax for the same classes. The tranformations above are only applied to legal JSON strings. For instance, both float('.5') and float(0.5) are legal Python code, but a JSON encoder will only produce '0.5', so a JSON decoder will reject '.5' as an error and not pass it on to float() or its parse_float substitute."
This suggestion is intended to replace class by class notes or the proposed addition to the parse_float entry.
I think the description of the parse hooks would be clearer if the input signature were given immediately with the name.
object_hook: I do not understand enough to suggest anything. Is the input a json string representing an object or a Python dict?
object_pairs_hook: Ditto. What is its input?
parse_int: This should come before parse_float, to match the table.
- '*parse_int*, if specified ...'
+ '*parse_int*(*num_str*), if specified ...'
'num_str' as the parameter name is used in the ... part.
parse_float(num_str): ditto
parse_constant(const_str): I do not understand "This can be used to raise an exception if invalid JSON numbers are encountered."
>>> def f(s): raise ValueError('custom')
...
>>> json.loads('.5', parse_constant=f)
gives same error as as without parse_constant. The sentence should be rewritten or removed.
|
msg210819 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2014-02-10 10:27 |
> My point is that Python and JSON are two different languages which have different syntaxes.
That is also the point of the generic note I suggested, which explains the consequence of the difference (rejection by JSON parser before calling Python constructor), with just one example given.
|
msg210825 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2014-02-10 11:21 |
> - 'Simple JSON decoder.'
> + 'A simple JSON decoder that splits JSON strings into JSON substrings
> that represent programming objects and then converts the substrings
> into Python objects.'
Please let's keep the description simple. Everyone is able to understand
what a JSON decoder is, and your suggested change is strangely confusing
("programming objects"?).
|
msg215087 - (view) |
Author: Steve Holden (holdenweb) * |
Date: 2014-03-28 22:47 |
How about: "A simple JSON decoder that converts between JSON string representations and Python data structures"?
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:54 | admin | set | status: pending -> open github: 64070 |
2017-03-07 15:51:01 | serhiy.storchaka | set | status: open -> pending |
2014-03-28 22:47:30 | holdenweb | set | nosy:
+ holdenweb messages:
+ msg215087
|
2014-03-24 20:57:21 | cvrebert | set | nosy:
+ cvrebert
|
2014-02-10 11:21:52 | pitrou | set | messages:
+ msg210825 |
2014-02-10 10:27:39 | terry.reedy | set | messages:
+ msg210819 |
2014-02-10 10:20:13 | terry.reedy | set | messages:
+ msg210817 |
2014-02-10 09:05:41 | serhiy.storchaka | set | messages:
+ msg210812 |
2014-02-10 08:10:04 | ezio.melotti | set | keywords:
+ easy nosy:
+ terry.reedy messages:
+ msg210807
|
2014-01-20 04:21:39 | vajrasky | set | files:
+ parse_non_valid_json_float_with_unit_test_v2.patch
messages:
+ msg208519 |
2013-12-16 10:44:22 | vajrasky | set | files:
+ parse_non_valid_json_float_with_unit_test.patch
messages:
+ msg206292 |
2013-12-16 08:42:52 | vajrasky | set | files:
+ fix_doc_parse_non_valid_json_float.patch
nosy:
+ vajrasky messages:
+ msg206280
keywords:
+ patch |
2013-12-04 21:09:27 | serhiy.storchaka | set | messages:
+ msg205247 |
2013-12-04 21:00:35 | tim.peters | set | nosy:
+ tim.peters messages:
+ msg205246
|
2013-12-04 20:20:56 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2013-12-04 20:12:02 | mark.dickinson | set | messages:
+ msg205242 |
2013-12-03 17:37:27 | jcea | set | nosy:
+ jcea
|
2013-12-03 14:41:19 | nedbat | set | nosy:
+ nedbat messages:
+ msg205119
|
2013-12-03 12:32:59 | mark.dickinson | set | messages:
+ msg205114 |
2013-12-03 09:18:23 | serhiy.storchaka | set | messages:
+ msg205091 |
2013-12-03 08:35:26 | mark.dickinson | set | nosy:
+ mark.dickinson
|
2013-12-03 08:34:53 | pitrou | set | nosy:
+ serhiy.storchaka
|
2013-12-03 08:34:41 | pitrou | set | assignee: docs@python components:
+ Documentation, Tests, - Library (Lib) versions:
- Python 3.5 nosy:
+ docs@python, pitrou
messages:
+ msg205083 stage: needs patch |
2013-12-03 08:21:12 | picomancer | create | |