Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte-code compilation uses excessive memory #49807

Closed
goddard mannequin opened this issue Mar 24, 2009 · 7 comments
Closed

Byte-code compilation uses excessive memory #49807

goddard mannequin opened this issue Mar 24, 2009 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@goddard
Copy link
Mannequin

goddard mannequin commented Mar 24, 2009

BPO 5557
Nosy @loewis, @birkenfeld, @pitrou, @vstinner

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2013-10-13.17:53:07.033>
created_at = <Date 2009-03-24.19:48:43.446>
labels = ['interpreter-core', 'performance']
title = 'Byte-code compilation uses excessive memory'
updated_at = <Date 2013-10-13.17:53:07.031>
user = 'https://bugs.python.org/goddard'

bugs.python.org fields:

activity = <Date 2013-10-13.17:53:07.031>
actor = 'georg.brandl'
assignee = 'none'
closed = True
closed_date = <Date 2013-10-13.17:53:07.033>
closer = 'georg.brandl'
components = ['Interpreter Core']
creation = <Date 2009-03-24.19:48:43.446>
creator = 'goddard'
dependencies = []
files = []
hgrepos = []
issue_num = 5557
keywords = []
message_count = 7.0
messages = ['84108', '84110', '84116', '84133', '84144', '84156', '199737']
nosy_count = 7.0
nosy_names = ['loewis', 'georg.brandl', 'collinwinter', 'pitrou', 'vstinner', 'goddard', 'Zhiping.Deng']
pr_nums = []
priority = 'low'
resolution = 'wont fix'
stage = None
status = 'closed'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue5557'
versions = ['Python 2.6']

@goddard
Copy link
Mannequin Author

goddard mannequin commented Mar 24, 2009

Bytecode compiling large Python files uses an unexpectedly large amount
of memory. For example, compiling a file containing a list of 5 million
integers uses about 2 Gbytes of memory while the Python file size is
about 40 Mbytes. The memory used is 50 times the file size. The
resulting list in Python consumes about 400 Mbytes of memory, so
compiling the byte codes uses about 5 times the memory of the list
object. Can the byte-code compilation can be made more memory efficient?

The application that creates simlilarly large Python files is a
molecular graphics program called UCSF Chimera that my lab develops. It
writes session files which are Python code. Sessions of reasonable size
for Chimera for a given amount of physical memory cannot be
byte-compiled without thrashing, crippling the interactivity of all
software running on the machine.

Here is Python code to produce the test file test.py containing a list
of 5 million integers:

print >>open('test.py','w'), 'x = ', repr(range(5000000))

I tried importing the test.py file with Python 2.5, 2.6.1 and 3.0.1 on
Mac OS 10.5.6. In each case when the test.pyc file is not present the
python process as monitored by the unix "top" command took about 1.7 Gb
RSS and 2.2 Gb VSZ on a MacBook Pro which has 2 Gb of memory.

@goddard goddard mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Mar 24, 2009
@loewis
Copy link
Mannequin

loewis mannequin commented Mar 24, 2009

It might be possible to make it more efficient. However, the primary
purpose of source code is to support hand-written code, and such code
should never run into such problems.

So lowering the priority. If you want this resolved, it might be best if
you provide a patch.

@vstinner
Copy link
Member

Python uses inefficent memory structure for integers. You should use a
3rd part library like numpy to manipulate large integer vectors.

@pitrou
Copy link
Member

pitrou commented Mar 25, 2009

When compiling a source file to bytecode, Python first builds a syntax
tree in memory. It is very likely that the memory consumption you
observe is due to the size of the syntax tree. It is also unlikely that
someone else than you will want to modifying the parsing code to
accomodate such an extreme usage scenario :-)

For persistence of large data structures, I suggest using cPickle or a
similar mechanism. You can even embed the pickles in literal strings if
you still need your sessions to be Python source code:

>>> import cPickle
>>> f = open("test.py", "w")
>>> f.write("import cPickle\n")
>>> f.write("x = cPickle.loads(%s)" % repr(cPickle.dumps(range(5000000),
protocol=-1)))
>>> f.close()
>>> import test
>>> len(test.x)
5000000

@goddard
Copy link
Mannequin Author

goddard mannequin commented Mar 25, 2009

I agree that having such large Python code files is a rare circumstance
and optimizing the byte-code compiler for that should be a low priority.

Thanks for the cpickle suggestion. The Chimera session file Python code
is mostly large nested dictionaries and sequences. I tested cPickle and
repr() to embed data structures in the Python code getting rather larger
file size because the 8-bit characters became 4 bytes in the text file
string (e.g. "\xe8"). Using cPickle, and base64 encoding dropped the
file size by about a factor of 2.5 and cPickle, bzip2 or zlib
compression, and base64 dropped the size another factor of 2. The big
win is that the byte code compilation used 150 Mbytes and 5 seconds
instead of 2 Gbytes and 15 minutes of thrashing for a 40 Mbyte python
file. I think our reason for not using pickled data originally in the
session files was because we like users to be able to look at and edit
the session files in a text editor. (This is research software where
such hacks sometimes are handy.) But the especially large data
structures in the sessions can't reasonably be meddled with by users so
pickling should be fine. Pickling adds about 15% to the session save
time, and reduces session opening by about the same amount. Compression
slows the save down another 15% and probably is not worth the factor of
2 reduction in file size in our case.

@pitrou
Copy link
Member

pitrou commented Mar 25, 2009

If you want editable data, you could use json instead of pickle. The
simplejson library has very fast encoding/decoding (faster than cPickle
according to its author).

@birkenfeld
Copy link
Member

Closing, as without a specific issue to fix it is unlikely that this will change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

4 participants