msg271249 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 11:42 |
JSONEncoder.iterencode doesn't work with empty iterators correctly.
Steps:
1. Define an iterator that is recognized by json as a list (inherit from list and define nonzero __len__).
2. Use json.dump with data containing an empty iterator defined as described in step#1 (but doesn't generate any items)
Expected result: it should be rendered as an empty list: '[]'
Actual result: it is rendered as ']' (only the closing bracket)
interestingly enough this behavior is not reproduced when using the dumps function.
I tried other alternatives to the standard json module: simplejson, ujson, hjson
All of them work as expected in this case (both brackets are rendered).
Here is an example of the code that demonstrates this error (compares the results of the dump and dumps functions):
import json as json
import io
class EmptyIterator(list):
def __iter__(self):
while False:
yield 1
def __len__(self):
return 1
def dump_to_str(data):
return json.dumps(data)
def dump_to_file(data):
stream = io.StringIO()
json.dump(data, stream)
return stream.getvalue()
data = {'it': EmptyIterator()}
print('to str: {0}'.format(dump_to_str(data)))
print('to file: {0}'.format(dump_to_file(data)))
This prints:
to str: {"it": []}
to file: {"it": ]}
|
msg271250 - (view) |
Author: SilentGhost (SilentGhost) * |
Date: 2016-07-25 11:52 |
Why does your __len__ method returns 1? Shouldn't it be 0 since this is an empty iterator? Changing it to zero seems to fix the "issue" too.
|
msg271251 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 12:05 |
I can't do that if I don't know how many entries there will be ahead of time. In my real-life situation I'm fetching the data from a database not knowing how many entries I'll get before I actually get them (in the iterator). In most cases there are huge amounts of entries that take up too much memory - that's why I need to stream it. But sometimes the result set is empty - and that's when everything fails.
|
msg271252 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 12:16 |
Actually, it does work with len = 0
even if the iterator is not empty. So, I guess that is a solution.
But still, I think the more correct way would be to make it work with > 0
|
msg271254 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 12:18 |
My bad - it doesn't work with non-empty iterators if you set len to 0, so not a solution
|
msg271255 - (view) |
Author: SilentGhost (SilentGhost) * |
Date: 2016-07-25 12:20 |
The question is why are you defining __len__ if you don't know the size of your final object? Or at least, why are you starting with a potentially wrong initial value? This issue doesn't exist if you either don't define the method or return correct value.
|
msg271258 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 12:35 |
If __len__ is not defined, then the iterator is considered empty and is always rendered as [] even if it really isn't empty
|
msg271259 - (view) |
Author: Grigory Statsenko (altvod) |
Date: 2016-07-25 12:44 |
With streaming you never know the real length before you're done iterating.
Anyway, the fix really shouldn't be that complicated:
In _iterencode_list just yield the '[' instead of saving it to buf
|
msg271272 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2016-07-25 13:48 |
If you break the invariants (in this case: a list has an accurate len) code that expects lists is only going to work by accident.
What you really want to do is define your own json encoder for your type. If that isn't possible for a streamed sequence of undefined length, then enhancing json's extension machinery to allow it would be a good feature request.
That said, could json's ability to handle this be improved? Possibly. I think we would accept a patch if it doesn't make the code more complicated. Ideally handling certain cases as "don't care" makes the code simpler, but that may or may not be the case here. Or it may come out as a consequence of the enhancement.
|
msg271641 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2016-07-29 18:03 |
I am surprised that dumping to a string and to a file give different answers. This strikes me as a bug in any case. I would expect the the main difference would be file.write(chunk) versus temlist.append(chunk).
|
msg292124 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-04-22 16:20 |
> I am surprised that dumping to a string and to a file give different answers.
This is a difference between Python and C implementations.
An iterable with fake __len__ looks breaking the invariants, but if a collection with overridden __bool__() is considered as more legitimate, the proposed patch fixes Python implementation of json for that case.
|
msg292172 - (view) |
Author: Raymond Hettinger (rhettinger) * |
Date: 2017-04-23 16:31 |
Per PEP-8, the Python preferred-style is, "For sequences, (strings, lists, tuples), use the fact that empty sequences are false."
Yes: if not seq:
if seq:
No: if len(seq):
if not len(seq):
The Python libraries are not obliged to defend themselves against non-sensical types (i.e. defining an empty iterator as a subclass of list and returning a non-zero len).
I recommend leaving the code as-is and closing a "not a bug". In a way, this report is no more interesting than observing that a __hash__ that returns a random value on each call doesn't work well in a dictionary.
|
msg292173 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-04-23 16:41 |
This looks as a strong argument to me. Thanks Raymond.
|
msg292185 - (view) |
Author: Terry J. Reedy (terry.reedy) * |
Date: 2017-04-23 20:45 |
Thank you Raymond. What I missed before is a) the OP's misnamed EmptyIterator is an iterable (possibly non-empty) but not an iterator, empty or otherwise, and b) a sequence __len__ that lies is just a bug. (So is a iterator that does not yield the contents of a collection.) A non-0 length(lst) is a promise that lst[0] exists. Depending on this is routine. If json.encoder line 296, 'for value in lst:' were replaced by the following, which should be equivalent,
for i in range(len(list)):
value = lst[i]
the encoding would die with IndexError.
The intention that buggy code should not cause a crash was met in this case.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:34 | admin | set | github: 71800 |
2018-06-13 13:12:52 | serhiy.storchaka | link | issue33850 superseder |
2017-04-23 20:45:03 | terry.reedy | set | messages:
+ msg292185 |
2017-04-23 16:42:00 | serhiy.storchaka | set | status: open -> closed resolution: not a bug messages:
+ msg292173
stage: patch review -> resolved |
2017-04-23 16:31:20 | rhettinger | set | messages:
+ msg292172 |
2017-04-22 16:21:38 | serhiy.storchaka | set | pull_requests:
+ pull_request1370 |
2017-04-22 16:20:41 | serhiy.storchaka | set | versions:
+ Python 2.7, Python 3.6, Python 3.7 nosy:
+ rhettinger, ezio.melotti, bob.ippolito, serhiy.storchaka
messages:
+ msg292124
components:
+ Library (Lib) stage: patch review |
2016-07-29 21:31:20 | ppperry | set | title: Empty iterator is rendered as a single bracket ] when using json's iterencode -> Empty iterator with fake __len__ is rendered as a single bracket ] when using json's iterencode |
2016-07-29 18:03:20 | terry.reedy | set | nosy:
+ terry.reedy messages:
+ msg271641
|
2016-07-25 13:48:42 | r.david.murray | set | nosy:
+ r.david.murray messages:
+ msg271272
|
2016-07-25 12:44:56 | altvod | set | messages:
+ msg271259 |
2016-07-25 12:35:58 | altvod | set | messages:
+ msg271258 |
2016-07-25 12:20:58 | SilentGhost | set | messages:
+ msg271255 |
2016-07-25 12:18:13 | altvod | set | messages:
+ msg271254 |
2016-07-25 12:16:28 | altvod | set | messages:
+ msg271252 |
2016-07-25 12:05:22 | altvod | set | messages:
+ msg271251 |
2016-07-25 11:52:43 | SilentGhost | set | nosy:
+ SilentGhost messages:
+ msg271250
|
2016-07-25 11:42:11 | altvod | create | |