classification
Title: collections.deque should ship with a stdlib json serializer
Type: enhancement Stage:
Components: Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: acdha, bob.ippolito, ezio.melotti, gdr@garethrees.org, lisroach, pitrou, r.david.murray, rhettinger, serhiy.storchaka
Priority: normal Keywords:

Created on 2014-02-25 21:15 by acdha, last changed 2019-03-06 05:21 by lisroach.

Pull Requests
URL Status Linked Edit
PR 830 open lisroach, 2017-03-27 01:43
Messages (14)
msg212215 - (view) Author: Chris Adams (acdha) Date: 2014-02-25 21:15
Currently the stdlib json module requires a custom serializer to avoid throwing a TypeError on collections.deque instances:

Python 3.3.4 (default, Feb 12 2014, 09:35:54) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from collections import deque
>>> import json
>>> d = deque(range(0, 10))
>>> json.dumps(d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/Cellar/python3/3.3.4/Frameworks/Python.framework/Versions/3.3/lib/python3.3/json/__init__.py", line 233, in dumps
    return _default_encoder.encode(obj)
  File "/usr/local/Cellar/python3/3.3.4/Frameworks/Python.framework/Versions/3.3/lib/python3.3/json/encoder.py", line 191, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/Cellar/python3/3.3.4/Frameworks/Python.framework/Versions/3.3/lib/python3.3/json/encoder.py", line 249, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/Cellar/python3/3.3.4/Frameworks/Python.framework/Versions/3.3/lib/python3.3/json/encoder.py", line 173, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) is not JSON serializable
msg212235 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-02-26 02:22
json is only designed to serialize standard data types out of the box.  Anything else is an extension.  I presume you are asking for this because a deque looks more-or-less like a list.  I'm not sure that's reason enough, but we'll see what others think.
msg212264 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-26 15:34
The problem is that it would be deserialized as a list; this breaks the general expectation that serialization formats should round-trip.

(yes, tuple already does this; but I think it is less of a problem for tuples, since the list API is a superset of the tuple API except for hashing)

So, perhaps we could ship an optional serializer (under which form?) accepting any sequence type (and perhaps any mapping type?), but it shouldn't be the default.
msg212275 - (view) Author: Gareth Rees (gdr@garethrees.org) * (Python triager) Date: 2014-02-26 16:43
The JSON implementation uses these tests to determine how to serialize a Python object:

    isinstance(o, (list, tuple))
    isinstance(o, dict)

So any subclasses of list and tuple are serialized as a list, and any subclass of dict is serialized as an object. For example:

    >>> json.dumps(collections.defaultdict())
    '{}'
    >>> json.dumps(collections.OrderedDict())
    '{}'
    >>> json.dumps(collections.namedtuple('mytuple', ())())
    '[]'

When deserialized, you'll get back a plain dictionary or list, so there's no round-trip property here.

The tests could perhaps be changed to:

    isinstance(o, collections.abc.Sequence)
    isinstance(o, collections.abc.Mapping)

I'm not a JSON expert, so I have no informed opinion on whether this is a good idea or not, but in any case, this change wouldn't help with deques, as a deque is not a Sequence. That's because deques don't have an index method (see issue10059 and issue12543).
msg288996 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 02:45
See also, the same feature request from Tarek Ziadé, http://bugs.python.org/issue29663 , "collections.deque could be serialized in JSON as a simple array. The only thing we can lose in the process is the maxlen value, but I think it's a decent behaviour to ignore it when encoding and to set it to None when decoding."

+1 from me as well.  This isn't really different that how we handle tuples and I can see that it would be useful to be able to dump a deque into JSON.  I concur that it is reasonable to ignore maxlen because that is primarily a convenience feature (auto-popping on overflow) rather than something that is intrinsic to the semantics of data itself.

For now, just adding deque support is reasonable.  We can't just do all sequences because string/bytearray like objects would need to be excluded.
msg289000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-05 06:01
See issue27362 for more general approach.
msg289005 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 08:50
For now, just hardcoding deque support is fine. 

Support for a __json__ attribute or JSON array registry is a topic for another day.  Even then, I don't think that within the standard library support for JSONification should have its responsibility shifted outside the of json module itself.
msg289013 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-03-05 14:58
I disagree, I think a __json__ protocol is sensible.  But this is why it needs to be discussed on python-dev or python-ideas first :)  In  the meantime adding deque support like we added enum support is reasonable, but IMO we shouldn't go to crazy adding support for non-base types before talking about a __json__ protocol.
msg289022 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 16:35
There is a difference.  An __json__ attribute would have to convert to a list first.  Adding support directly to the json module would allow the deque to be read directly.

I think you all are leaning towards premature generalization and making this harder than it needs to be.  Chris and Tarek's proposal is a reasonable and straight-forward, but it is not being pushed towards PEP territory and I think Guido would need to opine on whether to enshrine yet another dunder method that would infest the library and privilege the json serialization format over all formats.
msg289024 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 16:52
FWIW, one of the design goal for deques was to make them easily substitutable for lists when needed.  This feature request is a nice-to-have that moves us a little closer.

That said, I think a __json__ attribute is too big of a hammer for this simple proposal.

Also, please add Bob Ippolito to all JSON issues.  He has excellent design sensibilities and considerable contact with users of the json module.
msg290555 - (view) Author: Lisa Roach (lisroach) * (Python committer) Date: 2017-03-27 01:43
I made PR 830 for this issue, it seems to be a nice feature to have in my opinion. 

Let me know if I should add some unit tests :)
msg290561 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-27 03:24
Thanks Lisa.
msg290565 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-27 06:54
Seems there are reference leaks. And I afraid that importing a module for every serialized object can significantly hit the performance. Can you run some benchmarks?

> An __json__ attribute would have to convert to a list first.  Adding support directly to the json module would allow the deque to be read directly.

With PR 830 the deque is converted to a list by json encoder.
msg337279 - (view) Author: Lisa Roach (lisroach) * (Python committer) Date: 2019-03-06 05:21
Serhiy might be right, it looks significantly worse with benchmarking:

lisroach$ python3 -m timeit "import json; json.dumps(['test'])"
100000 loops, best of 3: 2.73 usec per loop

lisroach$ ./python.exe -m timeit "import json; json.dumps(['test'])"
10000 loops, best of 5: 21.2 usec per loop

lisroach$ python3 -m timeit "import json; json.dumps(10000)"
100000 loops, best of 3: 2.49 usec per loop

lisroach$ ./python.exe -m timeit "import json; json.dumps(10000)"
20000 loops, best of 5: 16.3 usec per loop
History
Date User Action Args
2019-03-06 05:21:12lisroachsetmessages: + msg337279
2017-03-27 06:54:25serhiy.storchakasetmessages: + msg290565
2017-03-27 03:24:55rhettingersetmessages: + msg290561
2017-03-27 01:43:22lisroachsetnosy: + lisroach

messages: + msg290555
pull_requests: + pull_request734
2017-03-05 16:52:47rhettingersetnosy: + bob.ippolito
messages: + msg289024
2017-03-05 16:35:54rhettingersetmessages: + msg289022
2017-03-05 14:58:55r.david.murraysetmessages: + msg289013
2017-03-05 08:50:10rhettingersetmessages: + msg289005
2017-03-05 06:01:16serhiy.storchakasetmessages: + msg289000
2017-03-05 02:45:50rhettingersetmessages: + msg288996
versions: + Python 3.7, - Python 3.5
2017-02-27 11:08:31serhiy.storchakalinkissue29663 superseder
2014-02-26 16:43:49gdr@garethrees.orgsetnosy: + gdr@garethrees.org
messages: + msg212275
2014-02-26 15:34:48pitrousetnosy: + serhiy.storchaka
2014-02-26 15:34:43pitrousetversions: + Python 3.5, - Python 2.7, Python 3.3
nosy: + rhettinger, pitrou, ezio.melotti

messages: + msg212264

type: enhancement
2014-02-26 02:22:34r.david.murraysetnosy: + r.david.murray
messages: + msg212235
2014-02-25 21:15:50acdhacreate