Message 142338 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	ezio.melotti, pitrou, rhettinger
Date	2011-08-18.15:23:15
SpamBayes Score	9.322122e-07
Marked as misclassified	No
Message-id	<1313680996.25.0.799183861359.issue12778@psf.upfronthosting.co.za>
In-reply-to

Content
On a 8GB RAM box (more than 6GB free), serializing many small objects can eat all memory, while the end result would take around 600MB on an UCS2 build: $ LANG=C time opt/python -c "import json; l = [1] * (10010241024); encoded = json.dumps(l)" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps return _default_encoder.encode(obj) File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode chunks = self.iterencode(o, _one_shot=True) File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode return _iterencode(o, 0) MemoryError Command exited with non-zero status 1 11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata 27820320maxresident)k 2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps I suppose the encoder internally builds a large list of very small unicode objects, and only joins them at the end. Probably we could join it by chunks so as to avoid this behaviour.

On a 8GB RAM box (more than 6GB free), serializing many small objects can eat all memory, while the end result would take around 600MB on an UCS2 build:

$ LANG=C time opt/python -c "import json; l = [1] * (100*1024*1024); encoded = json.dumps(l)"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/antoine/cpython/opt/Lib/json/__init__.py", line 224, in dumps
    return _default_encoder.encode(obj)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 188, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/home/antoine/cpython/opt/Lib/json/encoder.py", line 246, in iterencode
    return _iterencode(o, 0)
MemoryError
Command exited with non-zero status 1
11.25user 2.43system 0:13.72elapsed 99%CPU (0avgtext+0avgdata 27820320maxresident)k
2920inputs+0outputs (12major+1261388minor)pagefaults 0swaps


I suppose the encoder internally builds a large list of very small unicode objects, and only joins them at the end. Probably we could join it by chunks so as to avoid this behaviour.

History
Date	User	Action	Args
2011-08-18 15:23:16	pitrou	set	recipients: + pitrou, rhettinger, ezio.melotti
2011-08-18 15:23:16	pitrou	set	messageid: <1313680996.25.0.799183861359.issue12778@psf.upfronthosting.co.za>
2011-08-18 15:23:15	pitrou	link	issue12778 messages
2011-08-18 15:23:15	pitrou	create