Issue 5557: Byte-code compilation uses excessive memory

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/49807

classification

Title:	Byte-code compilation uses excessive memory
Type:	performance	Stage:
Components:	Interpreter Core	Versions:	Python 2.6

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	Zhiping.Deng, collinwinter, georg.brandl, goddard, loewis, pitrou, vstinner
Priority:	low	Keywords:

Created on 2009-03-24 19:48 by goddard, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (7)
msg84108 - (view)	Author: Tom Goddard (goddard)	Date: 2009-03-24 19:48
Bytecode compiling large Python files uses an unexpectedly large amount of memory. For example, compiling a file containing a list of 5 million integers uses about 2 Gbytes of memory while the Python file size is about 40 Mbytes. The memory used is 50 times the file size. The resulting list in Python consumes about 400 Mbytes of memory, so compiling the byte codes uses about 5 times the memory of the list object. Can the byte-code compilation can be made more memory efficient? The application that creates simlilarly large Python files is a molecular graphics program called UCSF Chimera that my lab develops. It writes session files which are Python code. Sessions of reasonable size for Chimera for a given amount of physical memory cannot be byte-compiled without thrashing, crippling the interactivity of all software running on the machine. Here is Python code to produce the test file test.py containing a list of 5 million integers: print >>open('test.py','w'), 'x = ', repr(range(5000000)) I tried importing the test.py file with Python 2.5, 2.6.1 and 3.0.1 on Mac OS 10.5.6. In each case when the test.pyc file is not present the python process as monitored by the unix "top" command took about 1.7 Gb RSS and 2.2 Gb VSZ on a MacBook Pro which has 2 Gb of memory.
msg84110 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2009-03-24 20:19
It might be possible to make it more efficient. However, the primary purpose of source code is to support hand-written code, and such code should never run into such problems. So lowering the priority. If you want this resolved, it might be best if you provide a patch.
msg84116 - (view)	Author: STINNER Victor (vstinner) *	Date: 2009-03-24 22:09
Python uses inefficent memory structure for integers. You should use a 3rd part library like numpy to manipulate large integer vectors.
msg84133 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-03-25 00:25
When compiling a source file to bytecode, Python first builds a syntax tree in memory. It is very likely that the memory consumption you observe is due to the size of the syntax tree. It is also unlikely that someone else than you will want to modifying the parsing code to accomodate such an extreme usage scenario :-) For persistence of large data structures, I suggest using cPickle or a similar mechanism. You can even embed the pickles in literal strings if you still need your sessions to be Python source code: >>> import cPickle >>> f = open("test.py", "w") >>> f.write("import cPickle\n") >>> f.write("x = cPickle.loads(%s)" % repr(cPickle.dumps(range(5000000), protocol=-1))) >>> f.close() >>> import test >>> len(test.x) 5000000
msg84144 - (view)	Author: Tom Goddard (goddard)	Date: 2009-03-25 07:02
I agree that having such large Python code files is a rare circumstance and optimizing the byte-code compiler for that should be a low priority. Thanks for the cpickle suggestion. The Chimera session file Python code is mostly large nested dictionaries and sequences. I tested cPickle and repr() to embed data structures in the Python code getting rather larger file size because the 8-bit characters became 4 bytes in the text file string (e.g. "\xe8"). Using cPickle, and base64 encoding dropped the file size by about a factor of 2.5 and cPickle, bzip2 or zlib compression, and base64 dropped the size another factor of 2. The big win is that the byte code compilation used 150 Mbytes and 5 seconds instead of 2 Gbytes and 15 minutes of thrashing for a 40 Mbyte python file. I think our reason for not using pickled data originally in the session files was because we like users to be able to look at and edit the session files in a text editor. (This is research software where such hacks sometimes are handy.) But the especially large data structures in the sessions can't reasonably be meddled with by users so pickling should be fine. Pickling adds about 15% to the session save time, and reduces session opening by about the same amount. Compression slows the save down another 15% and probably is not worth the factor of 2 reduction in file size in our case.
msg84156 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2009-03-25 10:37
If you want editable data, you could use json instead of pickle. The simplejson library has very fast encoding/decoding (faster than cPickle according to its author).
msg199737 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2013-10-13 17:53
Closing, as without a specific issue to fix it is unlikely that this will change.

History
Date	User	Action	Args
2022-04-11 14:56:46	admin	set	github: 49807
2013-10-13 17:53:07	georg.brandl	set	status: pending -> closed nosy: + georg.brandl messages: + msg199737 resolution: wont fix
2013-05-18 19:27:18	serhiy.storchaka	set	status: open -> pending
2012-05-08 03:31:21	Zhiping.Deng	set	nosy: + Zhiping.Deng
2009-03-27 05:53:38	collinwinter	set	nosy: + collinwinter
2009-03-25 10:38:00	pitrou	set	messages: + msg84156
2009-03-25 07:02:48	goddard	set	messages: + msg84144
2009-03-25 00:26:00	pitrou	set	nosy: + pitrou messages: + msg84133
2009-03-24 22:09:23	vstinner	set	nosy: + vstinner messages: + msg84116
2009-03-24 20:19:32	loewis	set	priority: low nosy: + loewis messages: + msg84110
2009-03-24 19:48:43	goddard	create