Byte-code compilation uses excessive memory #49807

goddard · 2009-03-24T19:48:43Z

BPO	5557
Nosy	@loewis, @birkenfeld, @pitrou, @vstinner

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-10-13.17:53:07.033>
created_at = <Date 2009-03-24.19:48:43.446>
labels = ['interpreter-core', 'performance']
title = 'Byte-code compilation uses excessive memory'
updated_at = <Date 2013-10-13.17:53:07.031>
user = 'https://bugs.python.org/goddard'

bugs.python.org fields:

activity = <Date 2013-10-13.17:53:07.031>
actor = 'georg.brandl'
assignee = 'none'
closed = True
closed_date = <Date 2013-10-13.17:53:07.033>
closer = 'georg.brandl'
components = ['Interpreter Core']
creation = <Date 2009-03-24.19:48:43.446>
creator = 'goddard'
dependencies = []
files = []
hgrepos = []
issue_num = 5557
keywords = []
message_count = 7.0
messages = ['84108', '84110', '84116', '84133', '84144', '84156', '199737']
nosy_count = 7.0
nosy_names = ['loewis', 'georg.brandl', 'collinwinter', 'pitrou', 'vstinner', 'goddard', 'Zhiping.Deng']
pr_nums = []
priority = 'low'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue5557'
versions = ['Python 2.6']

goddard · 2009-03-24T19:48:42Z

Bytecode compiling large Python files uses an unexpectedly large amount
of memory. For example, compiling a file containing a list of 5 million
integers uses about 2 Gbytes of memory while the Python file size is
about 40 Mbytes. The memory used is 50 times the file size. The
resulting list in Python consumes about 400 Mbytes of memory, so
compiling the byte codes uses about 5 times the memory of the list
object. Can the byte-code compilation can be made more memory efficient?

The application that creates simlilarly large Python files is a
molecular graphics program called UCSF Chimera that my lab develops. It
writes session files which are Python code. Sessions of reasonable size
for Chimera for a given amount of physical memory cannot be
byte-compiled without thrashing, crippling the interactivity of all
software running on the machine.

Here is Python code to produce the test file test.py containing a list
of 5 million integers:

print >>open('test.py','w'), 'x = ', repr(range(5000000))

I tried importing the test.py file with Python 2.5, 2.6.1 and 3.0.1 on
Mac OS 10.5.6. In each case when the test.pyc file is not present the
python process as monitored by the unix "top" command took about 1.7 Gb
RSS and 2.2 Gb VSZ on a MacBook Pro which has 2 Gb of memory.

loewis · 2009-03-24T20:19:32Z

It might be possible to make it more efficient. However, the primary
purpose of source code is to support hand-written code, and such code
should never run into such problems.

So lowering the priority. If you want this resolved, it might be best if
you provide a patch.

vstinner · 2009-03-24T22:09:23Z

Python uses inefficent memory structure for integers. You should use a
3rd part library like numpy to manipulate large integer vectors.

pitrou · 2009-03-25T00:26:00Z

When compiling a source file to bytecode, Python first builds a syntax
tree in memory. It is very likely that the memory consumption you
observe is due to the size of the syntax tree. It is also unlikely that
someone else than you will want to modifying the parsing code to
accomodate such an extreme usage scenario :-)

For persistence of large data structures, I suggest using cPickle or a
similar mechanism. You can even embed the pickles in literal strings if
you still need your sessions to be Python source code:

>>> import cPickle
>>> f = open("test.py", "w")
>>> f.write("import cPickle\n")
>>> f.write("x = cPickle.loads(%s)" % repr(cPickle.dumps(range(5000000),
protocol=-1)))
>>> f.close()
>>> import test
>>> len(test.x)
5000000

goddard · 2009-03-25T07:02:46Z

I agree that having such large Python code files is a rare circumstance
and optimizing the byte-code compiler for that should be a low priority.

Thanks for the cpickle suggestion. The Chimera session file Python code
is mostly large nested dictionaries and sequences. I tested cPickle and
repr() to embed data structures in the Python code getting rather larger
file size because the 8-bit characters became 4 bytes in the text file
string (e.g. "\xe8"). Using cPickle, and base64 encoding dropped the
file size by about a factor of 2.5 and cPickle, bzip2 or zlib
compression, and base64 dropped the size another factor of 2. The big
win is that the byte code compilation used 150 Mbytes and 5 seconds
instead of 2 Gbytes and 15 minutes of thrashing for a 40 Mbyte python
file. I think our reason for not using pickled data originally in the
session files was because we like users to be able to look at and edit
the session files in a text editor. (This is research software where
such hacks sometimes are handy.) But the especially large data
structures in the sessions can't reasonably be meddled with by users so
pickling should be fine. Pickling adds about 15% to the session save
time, and reduces session opening by about the same amount. Compression
slows the save down another 15% and probably is not worth the factor of
2 reduction in file size in our case.

pitrou · 2009-03-25T10:38:00Z

If you want editable data, you could use json instead of pickle. The
simplejson library has very fast encoding/decoding (faster than cPickle
according to its author).

birkenfeld · 2013-10-13T17:53:07Z

Closing, as without a specific issue to fix it is unlikely that this will change.

goddard mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Mar 24, 2009

birkenfeld closed this as completed Oct 13, 2013

ezio-melotti transferred this issue from another repository Apr 10, 2022

epigramx mentioned this issue Jul 20, 2023

Possible memory leak related to Byte-code compilation #106911

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Byte-code compilation uses excessive memory #49807

Byte-code compilation uses excessive memory #49807

goddard mannequin commented Mar 24, 2009

goddard mannequin commented Mar 24, 2009

loewis mannequin commented Mar 24, 2009

vstinner commented Mar 24, 2009

pitrou commented Mar 25, 2009

goddard mannequin commented Mar 25, 2009

pitrou commented Mar 25, 2009

birkenfeld commented Oct 13, 2013

Byte-code compilation uses excessive memory #49807

Byte-code compilation uses excessive memory #49807

Comments

goddard mannequin commented Mar 24, 2009

goddard mannequin commented Mar 24, 2009

loewis mannequin commented Mar 24, 2009

vstinner commented Mar 24, 2009

pitrou commented Mar 25, 2009

goddard mannequin commented Mar 25, 2009

pitrou commented Mar 25, 2009

birkenfeld commented Oct 13, 2013