classification
Title: json needs object_pairs_hook
Type: enhancement Stage:
Components: Library (Lib) Versions: Python 3.1, Python 2.7
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: bob.ippolito Nosy List: aronacher, bob.ippolito, cheeaun, rhettinger
Priority: high Keywords: patch

Created on 2009-02-27 08:37 by rhettinger, last changed 2009-03-29 22:37 by bob.ippolito. This issue is now closed.

Files
File name Uploaded Description Edit
json_hook.diff rhettinger, 2009-02-27 08:37 proof-of-concept patch: object_pair_hook()
json_hook.diff rhettinger, 2009-03-18 03:55 pairs hook patch with tests and docs
Messages (15)
msg82825 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 08:37
If PEP372 goes through, Python is going to gain an ordered dict soon.

The json module's encoder works well with it:

>>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
>>> json.dumps(OrderedDict(items))
'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'

But the decoder doesn't fare so well.  The existing object_hook for the
decoder passes in a dictionary instead of a list of pairs.  So, all the
ordering information is lost:

>>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
>>> json.loads(jtext, object_hook=OrderedDict)
OrderedDict({u'four': 4, u'three': 3, u'five': 5, u'two': 2, u'one': 1})

A solution is to provide an alternate hook that emits a sequence of
pairs.  If present, that hook should run instead of object_hook.  A
rough proof-of-concept patch is attached.

FWIW, sample ordered dict code is at: 
  http://code.activestate.com/recipes/576669/
msg82860 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-02-27 18:48
Why? According to RFC (emphasis mine):

An object is an *unordered* collection of zero or more name/value
   pairs, where a name is a string and a value is a string, number,
   boolean, null, object, or array.
msg82864 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 19:59
Same reason as for config files and yaml files.  Sometimes those files
represent human edited input and if a machine re-edits, filters, or
copies, it is nice to keep the original order (though it may make no
semantic difference to the computer).  

For example, jsonrpc method invocations are done with objects having
three properties (method, params, id).  The machine doesn't care about
the order of the properties but a human reader prefers the order listed:

  --> {"method": "postMessage", "params": ["Hello all!"], "id": 99}
  <-- {"result": 1, "error": null, "id": 99}

If you're testing a program that filters json data (like a typical xml
task), it is nice to write-out data in the same order received (failing
to do that is a common complaint about misdesigned xml filters):

  --> {{"title": "awk", "author":"aho", "isbn":"123456789X"},
       {"title": "taocp", "author":"knuth", "isbn":"987654321X"}"
  <-- {{"title": "awk", "author":"aho"},
       {"title": "taocp", "author":"knuth"}}
       
Semantically, those entries can be scrambled; however, someone reading
the filtered result desires that the input and output visually
correspond as much as possible.  An object_pairs_hook makes this possible.
msg82865 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 20:11
FWIW, here's the intended code for the filter in the last post:

    books = json.loads(infile, object_hook=OrderedDict)
    for book in books:
        del book['isbn']
    json.dumps(books, outfile)
msg82870 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-02-27 20:48
Fair enough, but the patch isn't usable because the decoder was rewritten 
in a later version of simplejson. There's another issue with patch to 
backport those back into Python http://bugs.python.org/issue4136 or you 
could just use the simplejson source here http://code.google.com/p/simplejson/
msg82872 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-02-27 20:57
Thanks.  I'll write-up a patch against
http://code.google.com/p/simplejson/ and assign it back to you for review.
msg82885 - (view) Author: Armin Ronacher (aronacher) * (Python committer) Date: 2009-02-27 23:38
Motivation:

Yes.  JSON says it's unordered.  However Hashes in Ruby are ordered
since 1.9 and they were since the very beginning in JavaScript and PHP.
msg83164 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-04 23:39
After enhancing namedtuple and ConfigParser, I found a simpler approach
that doesn't involve extending the API.  The simple way is to use
ordered dictionaries directly.  

With a small tweak to OD's repr, it is fully substitutable for a dict
without changing any client code or doctests (the OD loses its own
eval/repr order-preserving roundtrip but what json already gives now).

See attached patch.
msg83165 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-04 23:46
Unfortunately this is a patch for the old json lib... the new one has a C 
API and an entirely different method of parsing documents (for performance 
reasons).
msg83166 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-05 00:15
When do you expect the new C version to go in?  I'm looking forward to it.
msg83170 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-05 00:29
Whenever someone applies the patch for http://bugs.python.org/issue4136 -- 
I don't know when that will happen.
msg83733 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-18 03:55
Bob would you please take a look at the attached patch.
msg83819 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-19 18:56
This patch looks good to me, my only comment is that the patch mixes tabs 
and spaces in the C code in a file that had no tabs previously
msg83820 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2009-03-19 19:19
Thanks for looking at this.
Fixed the tab/space issue.
Committed in r70471
msg84441 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2009-03-29 22:37
I fixed two problems with this that didn't show up in the test suite, this 
feature didn't work in load() and there was a problem with the pure python 
code path because the Python scanner needed a small change. Unfortunately 
I'm not sure how to best test the pure python code path with Python's test 
suite, but I ran across it when backporting to simplejson.

r70702
History
Date User Action Args
2009-03-29 22:37:55bob.ippolitosetmessages: + msg84441
2009-03-22 05:51:36cheeaunsetnosy: + cheeaun
2009-03-19 19:19:28rhettingersetstatus: open -> closed
resolution: accepted
messages: + msg83820
2009-03-19 18:56:55bob.ippolitosetmessages: + msg83819
2009-03-18 03:55:07rhettingersetpriority: normal -> high
assignee: rhettinger -> bob.ippolito
messages: + msg83733

files: + json_hook.diff
2009-03-18 02:01:21rhettingersetfiles: - json_ordered.diff
2009-03-05 04:06:04rhettingersettitle: json need object_pairs_hook -> json needs object_pairs_hook
2009-03-05 00:29:21bob.ippolitosetmessages: + msg83170
2009-03-05 00:15:07rhettingersetmessages: + msg83166
2009-03-04 23:46:19bob.ippolitosetmessages: + msg83165
2009-03-04 23:39:56rhettingersetfiles: + json_ordered.diff
messages: + msg83164
2009-02-27 23:38:55aronachersetnosy: + aronacher
messages: + msg82885
2009-02-27 20:57:26rhettingersetassignee: bob.ippolito -> rhettinger
messages: + msg82872
2009-02-27 20:48:23bob.ippolitosetresolution: not a bug -> (no value)
messages: + msg82870
2009-02-27 20:11:16rhettingersetmessages: + msg82865
2009-02-27 19:59:12rhettingersetmessages: + msg82864
2009-02-27 18:48:08bob.ippolitosetresolution: not a bug
messages: + msg82860
2009-02-27 08:37:54rhettingercreate