This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: document that json.load/dump can’t be used twice on the same stream
Type: behavior Stage: resolved
Components: Documentation Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ezio.melotti Nosy List: beazley, bob.ippolito, docs@python, eric.araujo, ezio.melotti, louiscipher, python-dev
Priority: low Keywords: easy, needs review, patch

Created on 2008-12-30 17:16 by beazley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue4783.diff ezio.melotti, 2011-04-13 08:24 Patch to add a note against 2.7 review
Messages (10)
msg78547 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 17:16
The json module is described as having an interface similar to pickle:
  
    json.dump()
    json.dumps()
    json.load()
    json.loads()

I think it would be a WISE idea to add a huge warning message to the 
documentation that these functions should *NOT* be used to serialize or 
unserialize multiple objects on the same file stream like pickle. For 
example:

    f = open("stuff","w")
    json.dump(obj1, f)
    json.dump(obj2, f)        # NO!  FLAMING DEATH!

    f = open("stuff","r")
    obj1 = json.load(f)  
    obj2 = json.load(f)       # NO!  EXTRA CRIPSY FLAMING DEATH!

For one, it doesn't work. load() actually reads the whole file into a 
big string and tries to parse it as a single object.  If there are 
multiple objects in the file, you get a nasty exeption.   Second, I'm 
not even sure this is technically allowed by the JSON spec.

As far as I call tell, concatenating JSON objects together in the same 
file falls into the same category as concatenating two HTML documents 
together in the same file (something you just don't do).

Related: json.load() should probably not be used on any streaming input 
source such as a file wrapped around a socket.  The first thing it does 
is consume the entire input by calling f.read()---which probably not 
what someone is expecting (and it might even cause the whole program to 
hang).
msg78555 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2008-12-30 18:42
You're the first person to ever raise any of these issues in the slightly 
more than 3 years that the package has been around (by other names), so 
I'm not sure such a warning needs to be that big.

JSON doesn't really have any framing, so serializing more than one 
document to or from the same place doesn't work so well. It's not even 
talked about in the spec, and I've never seen someone try it before.
msg78556 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 19:02
Just consider me to be an impartial outside reviewer.  Hypothetically, 
let's say I'm a Python programmer who knows a thing or two about 
standard library modules (like pickle), but I'm new to JSON so I come 
looking at the json module documentation.   The documentation tells me 
it uses the same interface as pickle and marshal (even naming those two 
modules right off the bat).   So, right away, I'm thinking the module 
probably does all of the usual things that pickle and marshal can do.  
For instance, serializing multiple objects to the same stream.   
However, it doesn't work this way and the only way to find out that it 
doesn't work is to either try it and get an error, or to read the source 
code and figure it out.

I'm not reporting this as an end-user of the json module, but as a 
Python book author who is trying to get things right and to be precise.   I think if you're going to keep the pickle and marshal reference I would 
add the warning message.  Otherwise, I wouldn't mention pickle or 
marshal at all.
msg78557 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2008-12-30 20:30
Ok, I've added some notes to the trunk of simplejson's documentation. Not 
sure when/if that'll hit the Python trunks, I've been having a hard time 
getting my other patches to sync up with simplejson through: http://bugs.python.org/issue4136
msg78559 - (view) Author: David M. Beazley (beazley) Date: 2008-12-30 20:49
Thanks!  Hopefully I'm not giving you too much work to do :-).


Cheers,
Dave
msg114850 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-24 23:22
Bob, what is the status of this bug?
msg133650 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-13 08:24
Attached patch adds a note about the effects of using dump several times on the same file.
msg133688 - (view) Author: Bryce Verdier (louiscipher) Date: 2011-04-13 20:25
Not to nitpick, but what about the wording used in the simplejson documentation that Bob wrote almost 3 years ago? 

Note
JSON is not a framed protocol so unlike pickle or marshal it does not make sense to serialize more than one JSON document without some container protocol to delimit them

I also feel that it sounds a little bit cleaner.

http://simplejson.github.com/simplejson/#simplejson.dump
msg133705 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-04-14 03:01
I saw that and found it not clear, that's why I rephrased it.
In order to understand that one has to know what is a "framed protocol", what can be considered a "JSON document" (is a single object a JSON document? or does it need to be serialized first?), what is a "container protocol" (can I use one? where can I find it? is there a default one for JSON?).

I think it's clearer to just say that you can't do json.dump(obj1, f); dump(obj2, f).
I also omitted the note on `load`, because if you can't add more objects to the same file using json.dump you won't even try to use json.load to extract them one by one.
msg133784 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-04-15 04:40
New changeset 8dbf072556b9 by Ezio Melotti in branch '2.7':
#4783: document that is not possible to use json.dump twice on the same stream.
http://hg.python.org/cpython/rev/8dbf072556b9

New changeset 2ec08aa2c566 by Ezio Melotti in branch '3.1':
#4783: document that is not possible to use json.dump twice on the same stream.
http://hg.python.org/cpython/rev/2ec08aa2c566

New changeset 1e315794ac8c by Ezio Melotti in branch '3.2':
#4783: Merge with 3.1.
http://hg.python.org/cpython/rev/1e315794ac8c

New changeset 91881e304e13 by Ezio Melotti in branch 'default':
#4783: Merge with 3.2.
http://hg.python.org/cpython/rev/91881e304e13
History
Date User Action Args
2022-04-11 14:56:43adminsetgithub: 49033
2011-04-15 04:41:56ezio.melottisetstatus: open -> closed
assignee: bob.ippolito -> ezio.melotti
resolution: fixed
stage: patch review -> resolved
2011-04-15 04:40:55python-devsetnosy: + python-dev
messages: + msg133784
2011-04-14 03:01:34ezio.melottisetmessages: + msg133705
2011-04-13 20:25:42louisciphersetnosy: + louiscipher
messages: + msg133688
2011-04-13 08:24:17ezio.melottisetfiles: + issue4783.diff

nosy: + ezio.melotti
messages: + msg133650

keywords: + patch, easy, needs review
stage: patch review
2010-08-24 23:22:06eric.araujosetnosy: + docs@python, eric.araujo, - georg.brandl
title: json documentation needs a BAWM (Big A** Warning Message) -> document that json.load/dump can’t be used twice on the same stream
messages: + msg114850

versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6, Python 3.0
2008-12-30 20:49:56beazleysetmessages: + msg78559
2008-12-30 20:30:36bob.ippolitosetmessages: + msg78557
2008-12-30 19:02:16beazleysetmessages: + msg78556
2008-12-30 18:42:33bob.ippolitosetpriority: low
messages: + msg78555
2008-12-30 17:23:10benjamin.petersonsetassignee: georg.brandl -> bob.ippolito
nosy: + bob.ippolito
2008-12-30 17:16:49beazleycreate