Title: Add 'array_hook' for json module
Type: enhancement Stage: patch review
Components: Extension Modules, Library (Lib) Versions: Python 3.9
Status: open Resolution:
Dependencies: Superseder:
Assigned To: bob.ippolito Nosy List: bob.ippolito, matomatical
Priority: normal Keywords: patch

Created on 2019-04-27 01:56 by matomatical, last changed 2019-06-26 20:17 by terry.reedy.

Pull Requests
URL Status Linked Edit
PR 12980 open matomatical, 2019-04-27 02:01
Messages (1)
msg340957 - (view) Author: matt farrugia (matomatical) * Date: 2019-04-27 01:56
The json module allows a user to provide an `object_hook` function, which, if provided, is called to transform the dict that is created as a result of parsing a JSON Object.

It'd be nice if there was something analogous for JSON Arrays: an `array_hook` function to transform the list that is created as a result of parsing a JSON Array.

At the moment transforming JSON Arrays requires one of the following approaches (as far as I can see):

(1) Providing an object_hook function that will recursively transform any lists in the values of an Object/dict, including any nested lists, AND recursively transforming the final result in the event that the top level JSON object being parsed is an array (this array is never inside a JSON Object that goes through the `object_hook` transformation).
(2) Transforming the entire parsed result after parsing is finished by recursively transforming any lists in the final result, including recursively traversing nested lists AND nested dicts.

Providing an array_hook would cut out the need for either approach, as the recursive case from the recursive functions I mentioned could be used as the `array_hook` function directly (without the recursion).

## An example of usage:

Let's say we want JSON Arrays represented using tuples rather than lists, e.g. so that they are hashable straight out-of-the-(json)-box. Before this enhancement, this change requires one of the two methods I mentioned above. It is not so difficult to implement these recursive functions, but seems inelegant. After the change, `tuple` could be used as the `array_hook` directly:

>>> json.loads('{"foo": [[1, 2], "spam", [], ["eggs"]]}', array_hook=tuple)
{'foo': ((1, 2), 'spam', (), ('eggs',))}

It seems (in my opinion) this is more elegant than converting via an `object_hook` or traversing the whole structure after parsing.

## The patch:

I am submitting a patch that adds an `array_hook` kwarg to the `json` module's functions `load` and `loads`, and to the `json.decoder` module's `JSONDecoder`, `JSONArray` and `JSONObject` classes. I also hooked these together in the `json.scanner` module's `py_make_scanner` function.

It seems that `json.scanner` will prefer the `c_make_scanner` function defined in `Modules/_json.c` when it is available. I am not confident enough in my C skills or C x Python knowledge to dive into this module and make the analogous changes. But I assume they will be simple for someone who can read C x Python code, and that the changes will be analogous to those required to `Lib/json/`. I need help to accomplish this part of the patch.

## Testing:

In the mean time, I added a test to `test_json.test_decode`. It's CURRENTLY FAILING because the implementation of the patch is incomplete (I believe this is only due to the missing part of the patch---the required changes to `Modules/_json.c` I identified above).

When I manually reset `json.scanner.make_scanner` to `json.scanner.py_make_scanner` and play around with the new `array_hook` functionality, it seems to work.
Date User Action Args
2019-06-26 20:17:12terry.reedysetversions: + Python 3.9, - Python 3.8
2019-04-27 04:54:29rhettingersetassignee: bob.ippolito

nosy: + bob.ippolito
2019-04-27 02:01:05matomaticalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request12906
2019-04-27 01:56:58matomaticalcreate