Message 309763 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Adrián Orive
Recipients	Adrián Orive, Levi Cameron, bob.ippolito, ezio.melotti, oberstet, rhettinger, serhiy.storchaka
Date	2018-01-10.13:52:33
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1515592354.48.0.467229070634.issue29992@psf.upfronthosting.co.za>
In-reply-to

Content
I found the same problem. My case seems to be less exotic, as what I'm trying to do is parse some of these strings into decimal.Decimal or datetime.datetime formats. Returning a decimal as a string is becoming quite common in REST APIs to ensure there is no floating point errors. This is not a simple "a parameter is lacking problem": 1) JSONDecoder has 6 parse_XXX attributes (parse_int, parse_float, parse_constant, parse_string, parse_object, parse_array) and only first 3 of those are offered as parameters. The three last ones fall into a different category as they are not actually parsers but part of the scanner logic, but the first 3 are simple JSON types so, why keep only 3 parsers plus the 2 additional object hooks instead of providing a full set of parsers (arrays, strings, keys)? 2) JSONDecoder.__init__ method calls json.scanner.make_scanner function, so even when subclassing JSONDecoder and modifying some attributes after calling super().__init__ it will not work, the scanner needs to be reseted. 3) make_scanner is implementented in both C (c_make_scanner) and Python (py_make_scanner), the later is used as backup in case the former could not be imported. The C and Python versions behaviour IS NOT CONSISTENT. - c_make_scanner IGNORES JSONDecoder's parse_string attribute. This also applies to parse_array and parse_object attributes. - py_make_scanner ONLY uses it for JSON object values, keys have json.decoder.scanstring hardcoded. 4) ONLY make_scanner IS BEING "EXPORTED" (__all__ = ['make_scanner']) so knowing the existence of the two versions requires getting deep into json's code. This also applies to json.decoder's scanstring, JSONObject and JSONArray. The second point would be solved by providing all the needed params, as that would mean that you don't need to modify the attribute after calling JSONDecoder.__init__. This makes more sense than mnoving the make_scanner call out of the __init__ method as it is clearly part of the initialization. Has to be noted, however, that moving the make_scanner call from the __init__ to the raw_decode methods, despite making less sense, would only be a performance degradation for the default JSONDecover as the rest are only used once. The forth point would be solved if both the first and the third point are solved, as these methods (c_make_scanner, py_make_scanner, scanstring, JSONObject and JSONArray) would be implementation details and would not be needed by the user, so not exporting them would be the right choice. So my proposal focuses on fixing the first and third point, keeping in mind that it needs to be backwards compatible: The process of decoding a JSON string into a Python object can be conceptually divided into two steps, interpretting the characters and then transforming it into the corresponding Python object. The first step is what the scanner is doing with the character matching, the number regex, scanstring, JSONObject and JSONArray. The second step is what parse_int, parse_float, parse_constant, object_hook and object_pairs_hook attributes are for. Dividing this two steps its important as the first one is an implementation detail so it can stay hardcoded (keeping the consistency of both C and Python versions), while the second one is the one where the user is given some hooks to slightly modify its behaviour. Adding additional hooks for arrays, strings and objects' keys will give the users every customization tool available. This change plus refactoring the first steps to use names that do not get confused with these hooks or parsers will solve all the points described above. The following files represent an operational version of the json module with these changes applies. encoder.py and tool.py have not been modified. It has to be taken into account that some C aceletations have been disabled as the C _json module hasn't been modified and thus differ in either operation or method signature with the new version. If these changes seem to get the communities aproval and are thus gonna be applied to the standard library, in addition to the C _json module modifications to adapt to this new version, lines 123 and 311, marked with '# SWAP:' need to be also modified in order to use the C acelerations.

I found the same problem. My case seems to be less exotic, as what I'm trying to do is parse some of these strings into decimal.Decimal or datetime.datetime formats. Returning a decimal as a string is becoming quite common in REST APIs to ensure there is no floating point errors.

This is not a simple "a parameter is lacking problem":

1) JSONDecoder has 6 parse_XXX attributes (parse_int, parse_float, parse_constant, parse_string, parse_object, parse_array) and only first 3 of those are offered as parameters. The three last ones fall into a different category as they are not actually parsers but part of the scanner logic, but the first 3 are simple JSON types so, why keep only 3 parsers plus the 2 additional object hooks instead of providing a full set of parsers (arrays, strings, keys)?

2) JSONDecoder.__init__ method calls json.scanner.make_scanner function, so even when subclassing JSONDecoder and modifying some attributes after calling super().__init__ it will not work, the scanner needs to be reseted.

3) make_scanner is implementented in both C (c_make_scanner) and Python (py_make_scanner), the later is used as backup in case the former could not be imported. The C and Python versions behaviour IS NOT CONSISTENT.
- c_make_scanner IGNORES JSONDecoder's parse_string attribute. This also applies to parse_array and parse_object attributes.
- py_make_scanner ONLY uses it for JSON object values, keys have json.decoder.scanstring hardcoded.

4) ONLY make_scanner IS BEING "EXPORTED" (__all__ = ['make_scanner']) so knowing the existence of the two versions requires getting deep into json's code. This also applies to json.decoder's scanstring, JSONObject and JSONArray.

The second point would be solved by providing all the needed params, as that would mean that you don't need to modify the attribute after calling JSONDecoder.__init__. This makes more sense than mnoving the make_scanner call out of the __init__ method as it is clearly part of the initialization. Has to be noted, however, that moving the make_scanner call from the __init__ to the raw_decode methods, despite making less sense, would only be a performance degradation for the default JSONDecover as the rest are only used once.

The forth point would be solved if both the first and the third point are solved, as these methods (c_make_scanner, py_make_scanner, scanstring, JSONObject and JSONArray) would be implementation details and would not be needed by the user, so not exporting them would be the right choice.

So my proposal focuses on fixing the first and third point, keeping in mind that it needs to be backwards compatible:

The process of decoding a JSON string into a Python object can be conceptually divided into two steps, interpretting the characters and then transforming it into the corresponding Python object. The first step is what the scanner is doing with the character matching, the number regex, scanstring, JSONObject and JSONArray. The second step is what parse_int, parse_float, parse_constant, object_hook and object_pairs_hook attributes are for. Dividing this two steps its important as the first one is an implementation detail so it can stay hardcoded (keeping the consistency of both C and Python versions), while the second one is the one where the user is given some hooks to slightly modify its behaviour.

Adding additional hooks for arrays, strings and objects' keys will give the users every customization tool available. This change plus refactoring the first steps to use names that do not get confused with these hooks or parsers will solve all the points described above.

The following files represent an operational version of the json module with these changes applies. encoder.py and tool.py have not been modified.

It has to be taken into account that some C aceletations have been disabled as the C _json module hasn't been modified and thus differ in either operation or method signature with the new version. If these changes seem to get the communities aproval and are thus gonna be applied to the standard library, in addition to the C _json module modifications to adapt to this new version, lines 123 and 311, marked with '# SWAP:' need to be also modified in order to use the C acelerations.

History
Date	User	Action	Args
2018-01-10 13:52:35	Adrián Orive	set	recipients: + Adrián Orive, rhettinger, bob.ippolito, ezio.melotti, oberstet, serhiy.storchaka, Levi Cameron
2018-01-10 13:52:34	Adrián Orive	set	messageid: <1515592354.48.0.467229070634.issue29992@psf.upfronthosting.co.za>
2018-01-10 13:52:34	Adrián Orive	link	issue29992 messages
2018-01-10 13:52:34	Adrián Orive	create