classification
Title: Add memory footprint query
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: schuppenies Nosy List: MrJean1, facundobatista, georg.brandl, gvanrossum, jcea, loewis, rhettinger, schuppenies
Priority: normal Keywords: patch

Created on 2008-05-17 10:44 by schuppenies, last changed 2008-10-13 19:48 by jcea. This issue is now closed.

Files
File name Uploaded Description Edit
footprint.patch schuppenies, 2008-05-17 10:44 Patch against 2.6 trunk, revision 63363
sizeof.patch schuppenies, 2008-05-29 13:09 Patch against 2.6 trunk, revision 63363
Messages (21)
msg66989 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-17 10:44
I propose a patch which allows to query the memory footprint of an
object. Calling 'footprint(o)', a python developer can retrieve the
size of any python object. Only the size of the object itself will be
returned, the size of any referenced objects will be ignored.

The patch implements a generic function to compute the object
size. This works in most, but a few cases. One of these exceptions is
the dictionary with its particular table implementation. Such cases
can be handled by implementing an optional method in C. This would
also be the case for third-party implementations with unusual type
definitions.

One advantage with this approach is that the object size can be
computed at the level an object is allocated, not requiring complex
computations and considerations on higher levels.

I am not completely happy with the name 'footprint', but think using
'sizeof' would be confused with plain 'size', and 'memory_usage' was
somewhat too long to be typed conveniently.

Current test pass on linux32 and linux64, but the test suite is not
complete, yet.

This patch is part of my Google Summer of Code project on Python
memory profiling
(http://code.google.com/soc/2008/psf/appinfo.html?csaid=13F0E9C8B6E064EF).
Also, this is my first patch, so please let me know where missed
something, did not follow coding conventions, or made wrong
assumptions.
msg66990 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 10:49
Can't you write this as a simple Python function using
type.__basicsize__ and type.__itemsize__?

In any case, if this is added somewhere it should not be a builtin. This
operation is nowhere near the usefulness to be one.
msg66991 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-17 11:00
> Can't you write this as a simple Python function using
> type.__basicsize__ and type.__itemsize__?

Yes, it would be possible and has been done, e.g.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/546530. The
problem is though, that it requires handling of all special cases
externally. Any changes need to be addressed separately and unknown type
definitions cannot be addressed at all. Also I figured the programmer
implementing a type would know best about its size. Another point is
different architectures which result in different object sizes.

> In any case, if this is added somewhere it should not be a builtin.

What place would you consider to be appropriate?
msg66992 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 11:02
Such implementation-specific things usually went into the sys module.
msg66994 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 13:46
It's actually not possible, in general, to compute the memory
consumption of an object using basicsize and itemsize. An example is the
dictionary, where there is no way to find out how many slots are
currently allocated.

Even for the things such as lists where the formula
basicsize+len*itemsize would be correct it may fail, e.g. a list reports
its itemsize as zero, even though each list item consumes four bytes (on
a 32-bit system).

I don't really see a problem with calling it sizeof, so I would then
propose sys.sizeof as the appropriate location.
msg66995 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-17 13:54
Proposals like this have been rejected in the past.  Memory consumption 
is an evasive concept.  Lists over-allocate space, there are freelists, 
there are immortal objects, the python memory allocator may hang-on to 
space thought to be available, the packing and alignment of structures 
varies across implementations, the system memory allocator may assign 
much larger chunks than are needed for a single object, and the memory 
may not be freed back to the system.  Because of these issues, it is 
not that meaningful to say the object x consumes y bytes.
msg66996 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 14:10
> Proposals like this have been rejected in the past.  Memory consumption 
> is an evasive concept.  Lists over-allocate space

That issue is addressed in this patch.

> there are freelists, 

but they allocate just an upper bound.

> there are immortal objects, the python memory allocator may hang-on to 
> space thought to be available

These issues are orthogonal to the memory consumption of a single
object.

> the packing and alignment of structures 
> varies across implementations

This is addressed in the current patch.

> the system memory allocator may assign 
> much larger chunks than are needed for a single object

While true in general, this is not true in practice - in particular,
when objects get allocated through pymalloc.

> and the memory 
> may not be freed back to the system.  Because of these issues, it is 
> not that meaningful to say the object x consumes y bytes.

This is not true. It is meaningful to say that (and many that you
noted are independent from such a statement, as they say things for
the whole interpreter, not an individual object).

The patch meets a real need, and is the minimum amount of code that
actually *has* to be implemented in the virtual machine, to get
a reasonable analysis of the total memory consumption. Please be
practical here, not puristic.
msg67009 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-17 18:52
Lists will need a custom tp_footprint then, too. Or, if we call it
sizeof, the slot should be tp_sizeof. BTW, is a new slot necessary, or
can it just be a type method called __sizeof__?
msg67011 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-17 19:18
> Lists will need a custom tp_footprint then, too.

True.

> BTW, is a new slot necessary, or
> can it just be a type method called __sizeof__?

It wouldn't be a type method, but a regular method on the specific type,
right?

I think that would work as well.
msg67016 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-05-17 21:04
Guido, recently you've been opposed to adding more slots.  Any opinions 
on this one?  Also, is this something you want an additional builtin 
for?
msg67063 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-05-19 15:09
I'm torn about the extra slot; I'd rather not add one, but I can't see
how to make this flexible enough without one.

It should definitely not be a built-in; the sys module is fine though
(e.g. sys.getrefcount() lives there too).
msg67075 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-19 20:48
> I'm torn about the extra slot; I'd rather not add one, but I can't see
> how to make this flexible enough without one.

I think adding a default __sizeof__ implementation into object
(__basicsize__ + len()*__itemsize__), plus overriding that in
subclasses, should do the trick.

Not adding the default into object would cause an exception to be
raised whenever sys.sizeof checks for __sizeof__, which is fairly
expensive.

Having to look __sizeof__ up in the class dictionary, and
creating an argument list, is still fairly expensive (given that the
application we have in mind will apply sizeof to all objects,
repeatedly), however, this being a debugging facility, this overhead is
probably ok.
msg67438 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-28 07:35
I tried to implement a magic method __sizeof__() for the type object 
which should be callable for type objects and type itself.
But calling __sizeof__ results in an error message

>>> type.__sizeof__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__sizeof__' of 'type' object needs an argument

Debugging it I found that type_getattro will (1) look for the
attribute in the metatype, (2) look in tp_dict of this type, and (3)
use the descriptor from the metatype.

I actually want it to perform (3), but since type is its own
metatype (2) will be triggered. This then results in the need for an
argument. The same behavior occurs for all type instances, i.e. classes.

Is my understanding correct? How would it be possible to invoke
__sizeof__() on the type 'type' and not on the object 'type'?


My first approach did the same for object, that is a magic __sizeof__() 
method linked to object, but it gets ignored when invoked on classes or
types.
Now from my understanding everything is an object, thus also
classes and types. isinstance seems to agree with me

>>> >>> isinstance(int, object)
True


Any suggestions on that?

thanks,
robert
msg67440 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2008-05-28 07:48
You probably just need to make the method a class method -- see METH_CLASS.
msg67442 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-28 08:13
thanks, that did the trick.
msg67481 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-29 08:11
The attached patch implements the sizeof functionality as a sys module
function. __sizeof__ is implemented by object as a instance method, by
type as a class method as well as by types which's size cannot be
computed from basicsize, itemsize and ob_size.
sys.getsizeof() has some work-arounds to deal with type instances and
old-style classes.
msg67489 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-05-29 13:09
Nick Coghlan helped me to clear my 'metaclass confusion' so here is a
patch without an additional __sizeof__ for type objects.
msg67570 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-05-31 10:06
The patch looks fine to me, please apply. Don't forget to add a
Misc/NEWS entry.
msg67595 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-01 16:22
Applied in r63856.
msg68372 - (view) Author: Jean Brouwers (MrJean1) Date: 2008-06-18 19:49
Three questions on the sizeof.patch:

1) In the first line of function  dict_sizeof()

+	res = sizeof(PyDictObject) + sizeof(mp->ma_table);

is the  sizeof(mp->ma_table) counted twice?


2) Since functions  list_sizeof and  dict_sizeof return the allocated 
size, including the over-allocation, should function  string_sizeof not 
include the sentinel null character?


3) Are tuples left out on purpose?  If not, here is an implementation 
for Objects/tupleobject.c:

....
static PyObject *
tuple_sizeof(PyTupleObject *v)
{
	Py_ssize_t res;

	res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) * 
sizeof(void*);
	return PyInt_FromSsize_t(res);
}

PyDoc_STRVAR(sizeof_doc,
"T.__sizeof__() -- size of T in bytes");

....
static PyMethodDef tuple_methods[] = {
	{"__getnewargs__",	(PyCFunction)tuple_getnewargs,	
METH_NOARGS},
	{"__sizeof__",  (PyCFunction)tuple_sizeof, METH_NOARGS, 
sizeof_doc},
....

/Jean Brouwers
msg68377 - (view) Author: Robert Schuppenies (schuppenies) * (Python committer) Date: 2008-06-18 22:10
Jean Brouwers wrote:
> 1) In the first line of function  dict_sizeof()
> +	res = sizeof(PyDictObject) + sizeof(mp->ma_table);
> is the  sizeof(mp->ma_table) counted twice?

Yes, you are right. I'll fix this. 

> 2) Since functions  list_sizeof and  dict_sizeof return the allocated 
> size, including the over-allocation, should function  string_sizeof not 
> include the sentinel null character?

Isn't this addressed by taking PyStringObject.ob_sval into account? It
is allocated with 1 char length and thus always included. If I
understand the creation of strings correctly, the corresponding memory
is always allocated with

PyObject_MALLOC(sizeof(PyStringObject) + size)

which should mean that the space for the null terminating character is
included in the sizeof(PyStringObject).

> 
> 
> 3) Are tuples left out on purpose?  

No, that slipped the initial patch. I corrected in r64230. 

> ....
> static PyObject *
> tuple_sizeof(PyTupleObject *v)
> {
> 	Py_ssize_t res;
> 
> 	res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) * 
> sizeof(void*);
> 	return PyInt_FromSsize_t(res);
> }
> ....

Your implementation is like the applied changes from me, with one
difference. The basicsize of a tuple is defined as
"sizeof(PyTupleObject) - sizeof(PyObject *)"

When a tuple's memory is allocated, the required space is computed
roughly like this

(typeobj)->tp_basicsize + (nitems)*(typeobj)->tp_itemsize

Thus, I understand the memory allocated by a tuple to be

res = PyTuple_Type.tp_basicsize + Py_SIZE(v) * sizeof(PyObject *);
History
Date User Action Args
2008-10-13 19:48:15jceasetnosy: gvanrossum, loewis, georg.brandl, rhettinger, facundobatista, jcea, MrJean1, schuppenies
2008-06-18 22:10:51schuppeniessetmessages: + msg68377
2008-06-18 19:49:19MrJean1setnosy: + MrJean1
messages: + msg68372
2008-06-01 16:22:53schuppeniessetstatus: open -> closed
messages: + msg67595
2008-05-31 10:07:04loewissetassignee: gvanrossum -> schuppenies
resolution: accepted
messages: + msg67570
2008-05-29 13:09:45schuppeniessetfiles: + sizeof.patch
messages: + msg67489
2008-05-29 13:08:31schuppeniessetfiles: - sizeof.patch
2008-05-29 08:11:26schuppeniessetfiles: + sizeof.patch
messages: + msg67481
2008-05-28 16:45:12jceasetnosy: + jcea
2008-05-28 08:13:54schuppeniessetmessages: + msg67442
2008-05-28 07:48:16georg.brandlsetmessages: + msg67440
2008-05-28 07:35:57schuppeniessetmessages: + msg67438
2008-05-21 01:48:07facundobatistasetnosy: + facundobatista
2008-05-19 20:48:11loewissetmessages: + msg67075
2008-05-19 15:09:37gvanrossumsetmessages: + msg67063
2008-05-17 21:04:17rhettingersetassignee: gvanrossum
messages: + msg67016
nosy: + gvanrossum
2008-05-17 19:18:13loewissetmessages: + msg67011
2008-05-17 18:52:30georg.brandlsetmessages: + msg67009
2008-05-17 14:11:32loewissetmessages: + msg66996
2008-05-17 13:55:04rhettingersetnosy: + rhettinger
messages: + msg66995
2008-05-17 13:46:48loewissetnosy: + loewis
messages: + msg66994
2008-05-17 11:02:43georg.brandlsetmessages: + msg66992
2008-05-17 11:00:27schuppeniessetmessages: + msg66991
2008-05-17 10:50:22georg.brandlsetnosy: + georg.brandl
messages: + msg66990
2008-05-17 10:44:29schuppeniescreate