classification
Title: json iterencode can not handle general iterators
Type: enhancement Stage: needs patch
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: Aaron.Staley, Zectbumo, eric.araujo, ezio.melotti, ned.deily, pitrou, rhettinger, serhiy.storchaka, vlcinsky
Priority: low Keywords:

Created on 2012-04-13 22:10 by Aaron.Staley, last changed 2018-04-14 16:05 by serhiy.storchaka.

Messages (10)
msg158239 - (view) Author: Aaron Staley (Aaron.Staley) Date: 2012-04-13 22:10
The json library's encoder includes a function called 'iterencode'.  iterencode allows for encoding to be streamed; as tokens are produced they are yielded. This allows for the encoded object to be streamed to a file, over a socket, etc. without being placed all into memory.

Unfortunately, iterencode cannot encode general iterators.  This significantly limits the usefulness of the function.  For my use case I wish to convert a large stream (iterator) of objects into json.  Unfortunately, I currently have to:

A. Bring all the objects into memory by encasing the iterator in a list()
B. Make a hack where I subclass list and making that object's __iter__ function return my desired iterator.

The problem is that the json library explicitly checks for something being a list:

                if isinstance(value, (list, tuple)):
                    chunks = _iterencode_list(value, _current_indent_level)

It would work just as well (and be more pythonic) to see if the value supports the iterator protocol:
                if isinstance(value, collections.Iterable):
                    chunks = _iterencode_list(value, _current_indent_level)


Erroring example:

>>> import json
>>> e = json.JSONEncoder()
>>> r = xrange(20)
>>> gen = e.iterencode(r)
<generator object _iterencode at 0x14a5460>
>>> next(gen)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/json/encoder.py", line 419, in _iterencode
    o = _default(o)
  File "/usr/lib/python3.2/json/encoder.py", line 170, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: xrange(0, 20) is not JSON serializable
msg158334 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-04-15 14:21
That's more of a feature request than a bug. By definition JSON can only represent a small subset of Python's types.
Also, if you encode an iterator as a JSON list, you will get back a Python list when decoding the JSON representation, so it won't round-trip.
msg158906 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2012-04-21 01:49
Agreed with Antoine; I think that if this is added, it should be opt-in, not default.  Also, it is not clear if the request is about iterators or iterables.
msg228166 - (view) Author: Alfred Morgan (Zectbumo) Date: 2014-10-02 06:46
Need a patch? Here you go.

    https://github.com/Zectbumo/cpython/compare/master

How to use it:

    encoder = JSONEncoder(stream=True)

This will iterencode() iterators as lists and file objects as strings and stream them when constructed with stream=True.
msg315222 - (view) Author: Jan Vlcinsky (vlcinsky) Date: 2018-04-12 13:32
I found proposed change very handy (came here researching why it is not behaving that way).

Taking into account:
- Python shines in handling lists, using generators and iterators
- Largest group of python developers develop web apps, and there it is typical pattern to iterate across many records and return them as long list of dictionaries

I think proposed feature could become very popular.
msg315229 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-04-12 16:41
If there is still interest in this, perhaps @Zectbumo could rebase the patch as a PR against the master branch of the current python/cpython repo now that we've moved to git and GitHub (https://devguide.python.org/pullrequest/).  There's no guarantee that it will ultimately be accepted but it will make it much easier to review.
msg315249 - (view) Author: Alfred Morgan (Zectbumo) Date: 2018-04-13 16:36
I would love to but it is a bit late for me now. The json encoding has been optimized in c which falls outside my expertise.
https://github.com/python/cpython/blob/master/Modules/_json.c
msg315251 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-13 18:07
Things are more complicated. bytes object is an iterable. I think serialize an bytes object (which can be unexpectedly leaked in a code ported from 2.7) as a list of integers is not expected behavior in most cases. It is safer to fail by default and provide an explicit handling for bytes if it is needed. There are other iterables that are better not to serialize by default: mappings, which are not dict subclasses, exhaustible iterators.

I'm working on large patch for the json module (maybe even a PEP). It will not allow serializing all iterables by default, but will make easy to switch on serialization for particular types, including iterables.
msg315297 - (view) Author: Alfred Morgan (Zectbumo) Date: 2018-04-14 15:53
@serhiy.storchaka while you are doing your overhaul will you please add support for raw json values. I often find myself where I have a serialized object that I want to include in an object response that I'm about to serialize anyway. The implementation should be very simple. Here is my workaround code:

class RawJSON(str): pass

origEnc = json.encoder.encode_basestring_ascii
def rawEnc(obj):
  if isinstance(obj, RawJSON):
    return obj
  return origEnc(obj)
json.encoder.encode_basestring_ascii = rawEnc

https://stackoverflow.com/a/48985560/289240
msg315298 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-14 16:05
It is included.
History
Date User Action Args
2018-04-14 16:05:14serhiy.storchakasetmessages: + msg315298
2018-04-14 15:53:49Zectbumosetmessages: + msg315297
2018-04-13 18:07:05serhiy.storchakasetpriority: normal -> low

nosy: + serhiy.storchaka
messages: + msg315251

assignee: serhiy.storchaka
2018-04-13 16:36:10Zectbumosetmessages: + msg315249
2018-04-12 16:41:50ned.deilysetnosy: + ned.deily

messages: + msg315229
versions: + Python 3.8, - Python 3.5
2018-04-12 13:32:09vlcinskysetnosy: + vlcinsky
messages: + msg315222
2014-10-02 06:46:10Zectbumosetnosy: + Zectbumo

messages: + msg228166
versions: + Python 3.5, - Python 3.3
2012-04-21 01:49:13eric.araujosetnosy: + eric.araujo
messages: + msg158906
2012-04-15 14:21:09pitrousetversions: - Python 2.7, Python 3.2
nosy: + pitrou

messages: + msg158334

type: behavior -> enhancement
stage: test needed -> needs patch
2012-04-13 23:26:41ezio.melottisetnosy: + rhettinger, ezio.melotti
stage: test needed
type: behavior

versions: + Python 3.3, - Python 2.6, Python 3.1
2012-04-13 22:10:33Aaron.Staleycreate