This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Empty iterator with fake __len__ is rendered as a single bracket ] when using json's iterencode
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.7, Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, altvod, bob.ippolito, ezio.melotti, r.david.murray, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2016-07-25 11:42 by altvod, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 1257 closed serhiy.storchaka, 2017-04-22 16:21
Messages (14)
msg271249 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 11:42
JSONEncoder.iterencode doesn't work with empty iterators correctly.
Steps:
1. Define an iterator that is recognized by json as a list (inherit from list and define nonzero __len__).
2. Use json.dump with data containing an empty iterator defined as described in step#1 (but doesn't generate any items)

Expected result: it should be rendered as an empty list: '[]'

Actual result: it is rendered as ']' (only the closing bracket)
interestingly enough this behavior is not reproduced when using the dumps function.
I tried other alternatives to the standard json module: simplejson, ujson, hjson
All of them work as expected in this case (both brackets are rendered).

Here is an example of the code that demonstrates this error (compares the results of the dump and dumps functions):


import json as json
import io

class EmptyIterator(list):
    def __iter__(self):
        while False:
            yield 1
    def __len__(self):
        return 1

def dump_to_str(data):
    return json.dumps(data)

def dump_to_file(data):
    stream = io.StringIO()
    json.dump(data, stream)
    return stream.getvalue()


data = {'it': EmptyIterator()}
print('to str: {0}'.format(dump_to_str(data)))
print('to file: {0}'.format(dump_to_file(data)))



This prints:
to str: {"it": []}
to file: {"it": ]}
msg271250 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-07-25 11:52
Why does your __len__ method returns 1? Shouldn't it be 0 since this is an empty iterator? Changing it to zero seems to fix the "issue" too.
msg271251 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 12:05
I can't do that if I don't know how many entries there will be ahead of time. In my real-life situation I'm fetching the data from a database not knowing how many entries I'll get before I actually get them (in the iterator). In most cases there are huge amounts of entries that take up too much memory - that's why I need to stream it. But sometimes the result set is empty - and that's when everything fails.
msg271252 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 12:16
Actually, it does work with len = 0
even if the iterator is not empty. So, I guess that is a solution.
But still, I think the more correct way would be to make it work with > 0
msg271254 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 12:18
My bad - it doesn't work with non-empty iterators if you set len to 0, so not a solution
msg271255 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-07-25 12:20
The question is why are you defining __len__ if you don't know the size of your final object? Or at least, why are you starting with a potentially wrong initial value? This issue doesn't exist if you either don't define the method or return correct value.
msg271258 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 12:35
If __len__ is not defined, then the iterator is considered empty and is always rendered as [] even if it really isn't empty
msg271259 - (view) Author: Grigory Statsenko (altvod) Date: 2016-07-25 12:44
With streaming you never know the real length before you're done iterating.

Anyway, the fix really shouldn't be that complicated:
In _iterencode_list just yield  the '[' instead of saving it to buf
msg271272 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-25 13:48
If you break the invariants (in this case: a list has an accurate len) code that expects lists is only going to work by accident.

What you really want to do is define your own json encoder for your type.  If that isn't possible for a streamed sequence of undefined length, then enhancing json's extension machinery to allow it would be a good feature request.

That said, could json's ability to handle this be improved?  Possibly.  I think we would accept a patch if it doesn't make the code more complicated.  Ideally handling certain cases as "don't care" makes the code simpler, but that may or may not be the case here.  Or it may come out as a consequence of the enhancement.
msg271641 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-07-29 18:03
I am surprised that dumping to a string and to a file give different answers.  This strikes me as a bug in any case.  I would expect the the main difference would be file.write(chunk) versus temlist.append(chunk).
msg292124 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-22 16:20
> I am surprised that dumping to a string and to a file give different answers.

This is a difference between Python and C implementations.

An iterable with fake __len__ looks breaking the invariants, but if a collection with overridden __bool__() is considered as more legitimate, the proposed patch fixes Python implementation of json for that case.
msg292172 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-04-23 16:31
Per PEP-8, the Python preferred-style is, "For sequences, (strings, lists, tuples), use the fact that empty sequences are false."

    Yes: if not seq:
        if seq:

    No: if len(seq):
        if not len(seq):

The Python libraries are not obliged to defend themselves against non-sensical types (i.e. defining an empty iterator as a subclass of list and returning a non-zero len).

I recommend leaving the code as-is and closing a "not a bug".  In a way, this report is no more interesting than observing that a __hash__ that returns a random value on each call doesn't work well in a dictionary.
msg292173 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-23 16:41
This looks as a strong argument to me. Thanks Raymond.
msg292185 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-04-23 20:45
Thank you Raymond.  What I missed before is a) the OP's misnamed EmptyIterator is an iterable (possibly non-empty) but not an iterator, empty or otherwise, and b) a sequence __len__ that lies is just a bug.  (So is a iterator that does not yield the contents of a collection.)  A non-0 length(lst) is a promise that lst[0] exists.  Depending on this is routine.  If json.encoder line 296, 'for value in lst:' were replaced by the following, which should be equivalent,
        for i in range(len(list)):
            value = lst[i]
the encoding would die with IndexError.

The intention that buggy code should not cause a crash was met in this case.
History
Date User Action Args
2022-04-11 14:58:34adminsetgithub: 71800
2018-06-13 13:12:52serhiy.storchakalinkissue33850 superseder
2017-04-23 20:45:03terry.reedysetmessages: + msg292185
2017-04-23 16:42:00serhiy.storchakasetstatus: open -> closed
resolution: not a bug
messages: + msg292173

stage: patch review -> resolved
2017-04-23 16:31:20rhettingersetmessages: + msg292172
2017-04-22 16:21:38serhiy.storchakasetpull_requests: + pull_request1370
2017-04-22 16:20:41serhiy.storchakasetversions: + Python 2.7, Python 3.6, Python 3.7
nosy: + rhettinger, ezio.melotti, bob.ippolito, serhiy.storchaka

messages: + msg292124

components: + Library (Lib)
stage: patch review
2016-07-29 21:31:20ppperrysettitle: Empty iterator is rendered as a single bracket ] when using json's iterencode -> Empty iterator with fake __len__ is rendered as a single bracket ] when using json's iterencode
2016-07-29 18:03:20terry.reedysetnosy: + terry.reedy
messages: + msg271641
2016-07-25 13:48:42r.david.murraysetnosy: + r.david.murray
messages: + msg271272
2016-07-25 12:44:56altvodsetmessages: + msg271259
2016-07-25 12:35:58altvodsetmessages: + msg271258
2016-07-25 12:20:58SilentGhostsetmessages: + msg271255
2016-07-25 12:18:13altvodsetmessages: + msg271254
2016-07-25 12:16:28altvodsetmessages: + msg271252
2016-07-25 12:05:22altvodsetmessages: + msg271251
2016-07-25 11:52:43SilentGhostsetnosy: + SilentGhost
messages: + msg271250
2016-07-25 11:42:11altvodcreate