This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author pitrou
Recipients ezio.melotti, pitrou, rhettinger
Date 2011-08-18.15:23:15
SpamBayes Score 9.322122e-07
Marked as misclassified No
Message-id <1313680996.25.0.799183861359.issue12778@psf.upfronthosting.co.za>
In-reply-to
Content
On a 8GB RAM box (more than 6GB free), serializing many small objects can eat all memory, while the end result would take around 600MB on an UCS2 build:

$ LANG=C time opt/python -c "import json; l = [1] * (100*1024*1024); encoded = json.dumps(l)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps
    return _default_encoder.encode(obj)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode
    return _iterencode(o, 0)
MemoryError
Command exited with non-zero status 1
11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata 27820320maxresident)k
2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps


I suppose the encoder internally builds a large list of very small unicode objects, and only joins them at the end. Probably we could join it by chunks so as to avoid this behaviour.
History
Date User Action Args
2011-08-18 15:23:16pitrousetrecipients: + pitrou, rhettinger, ezio.melotti
2011-08-18 15:23:16pitrousetmessageid: <1313680996.25.0.799183861359.issue12778@psf.upfronthosting.co.za>
2011-08-18 15:23:15pitroulinkissue12778 messages
2011-08-18 15:23:15pitroucreate