msg66989 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-17 10:44 |
I propose a patch which allows to query the memory footprint of an
object. Calling 'footprint(o)', a python developer can retrieve the
size of any python object. Only the size of the object itself will be
returned, the size of any referenced objects will be ignored.
The patch implements a generic function to compute the object
size. This works in most, but a few cases. One of these exceptions is
the dictionary with its particular table implementation. Such cases
can be handled by implementing an optional method in C. This would
also be the case for third-party implementations with unusual type
definitions.
One advantage with this approach is that the object size can be
computed at the level an object is allocated, not requiring complex
computations and considerations on higher levels.
I am not completely happy with the name 'footprint', but think using
'sizeof' would be confused with plain 'size', and 'memory_usage' was
somewhat too long to be typed conveniently.
Current test pass on linux32 and linux64, but the test suite is not
complete, yet.
This patch is part of my Google Summer of Code project on Python
memory profiling
(http://code.google.com/soc/2008/psf/appinfo.html?csaid=13F0E9C8B6E064EF).
Also, this is my first patch, so please let me know where missed
something, did not follow coding conventions, or made wrong
assumptions.
|
msg66990 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-05-17 10:49 |
Can't you write this as a simple Python function using
type.__basicsize__ and type.__itemsize__?
In any case, if this is added somewhere it should not be a builtin. This
operation is nowhere near the usefulness to be one.
|
msg66991 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-17 11:00 |
> Can't you write this as a simple Python function using
> type.__basicsize__ and type.__itemsize__?
Yes, it would be possible and has been done, e.g.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/546530. The
problem is though, that it requires handling of all special cases
externally. Any changes need to be addressed separately and unknown type
definitions cannot be addressed at all. Also I figured the programmer
implementing a type would know best about its size. Another point is
different architectures which result in different object sizes.
> In any case, if this is added somewhere it should not be a builtin.
What place would you consider to be appropriate?
|
msg66992 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-05-17 11:02 |
Such implementation-specific things usually went into the sys module.
|
msg66994 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-05-17 13:46 |
It's actually not possible, in general, to compute the memory
consumption of an object using basicsize and itemsize. An example is the
dictionary, where there is no way to find out how many slots are
currently allocated.
Even for the things such as lists where the formula
basicsize+len*itemsize would be correct it may fail, e.g. a list reports
its itemsize as zero, even though each list item consumes four bytes (on
a 32-bit system).
I don't really see a problem with calling it sizeof, so I would then
propose sys.sizeof as the appropriate location.
|
msg66995 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-05-17 13:54 |
Proposals like this have been rejected in the past. Memory consumption
is an evasive concept. Lists over-allocate space, there are freelists,
there are immortal objects, the python memory allocator may hang-on to
space thought to be available, the packing and alignment of structures
varies across implementations, the system memory allocator may assign
much larger chunks than are needed for a single object, and the memory
may not be freed back to the system. Because of these issues, it is
not that meaningful to say the object x consumes y bytes.
|
msg66996 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-05-17 14:10 |
> Proposals like this have been rejected in the past. Memory consumption
> is an evasive concept. Lists over-allocate space
That issue is addressed in this patch.
> there are freelists,
but they allocate just an upper bound.
> there are immortal objects, the python memory allocator may hang-on to
> space thought to be available
These issues are orthogonal to the memory consumption of a single
object.
> the packing and alignment of structures
> varies across implementations
This is addressed in the current patch.
> the system memory allocator may assign
> much larger chunks than are needed for a single object
While true in general, this is not true in practice - in particular,
when objects get allocated through pymalloc.
> and the memory
> may not be freed back to the system. Because of these issues, it is
> not that meaningful to say the object x consumes y bytes.
This is not true. It is meaningful to say that (and many that you
noted are independent from such a statement, as they say things for
the whole interpreter, not an individual object).
The patch meets a real need, and is the minimum amount of code that
actually *has* to be implemented in the virtual machine, to get
a reasonable analysis of the total memory consumption. Please be
practical here, not puristic.
|
msg67009 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-05-17 18:52 |
Lists will need a custom tp_footprint then, too. Or, if we call it
sizeof, the slot should be tp_sizeof. BTW, is a new slot necessary, or
can it just be a type method called __sizeof__?
|
msg67011 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-05-17 19:18 |
> Lists will need a custom tp_footprint then, too.
True.
> BTW, is a new slot necessary, or
> can it just be a type method called __sizeof__?
It wouldn't be a type method, but a regular method on the specific type,
right?
I think that would work as well.
|
msg67016 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2008-05-17 21:04 |
Guido, recently you've been opposed to adding more slots. Any opinions
on this one? Also, is this something you want an additional builtin
for?
|
msg67063 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2008-05-19 15:09 |
I'm torn about the extra slot; I'd rather not add one, but I can't see
how to make this flexible enough without one.
It should definitely not be a built-in; the sys module is fine though
(e.g. sys.getrefcount() lives there too).
|
msg67075 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-05-19 20:48 |
> I'm torn about the extra slot; I'd rather not add one, but I can't see
> how to make this flexible enough without one.
I think adding a default __sizeof__ implementation into object
(__basicsize__ + len()*__itemsize__), plus overriding that in
subclasses, should do the trick.
Not adding the default into object would cause an exception to be
raised whenever sys.sizeof checks for __sizeof__, which is fairly
expensive.
Having to look __sizeof__ up in the class dictionary, and
creating an argument list, is still fairly expensive (given that the
application we have in mind will apply sizeof to all objects,
repeatedly), however, this being a debugging facility, this overhead is
probably ok.
|
msg67438 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-28 07:35 |
I tried to implement a magic method __sizeof__() for the type object
which should be callable for type objects and type itself.
But calling __sizeof__ results in an error message
>>> type.__sizeof__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__sizeof__' of 'type' object needs an argument
Debugging it I found that type_getattro will (1) look for the
attribute in the metatype, (2) look in tp_dict of this type, and (3)
use the descriptor from the metatype.
I actually want it to perform (3), but since type is its own
metatype (2) will be triggered. This then results in the need for an
argument. The same behavior occurs for all type instances, i.e. classes.
Is my understanding correct? How would it be possible to invoke
__sizeof__() on the type 'type' and not on the object 'type'?
My first approach did the same for object, that is a magic __sizeof__()
method linked to object, but it gets ignored when invoked on classes or
types.
Now from my understanding everything is an object, thus also
classes and types. isinstance seems to agree with me
>>> >>> isinstance(int, object)
True
Any suggestions on that?
thanks,
robert
|
msg67440 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2008-05-28 07:48 |
You probably just need to make the method a class method -- see METH_CLASS.
|
msg67442 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-28 08:13 |
thanks, that did the trick.
|
msg67481 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-29 08:11 |
The attached patch implements the sizeof functionality as a sys module
function. __sizeof__ is implemented by object as a instance method, by
type as a class method as well as by types which's size cannot be
computed from basicsize, itemsize and ob_size.
sys.getsizeof() has some work-arounds to deal with type instances and
old-style classes.
|
msg67489 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-05-29 13:09 |
Nick Coghlan helped me to clear my 'metaclass confusion' so here is a
patch without an additional __sizeof__ for type objects.
|
msg67570 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-05-31 10:06 |
The patch looks fine to me, please apply. Don't forget to add a
Misc/NEWS entry.
|
msg67595 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-06-01 16:22 |
Applied in r63856.
|
msg68372 - (view) |
Author: Jean Brouwers (MrJean1) |
Date: 2008-06-18 19:49 |
Three questions on the sizeof.patch:
1) In the first line of function dict_sizeof()
+ res = sizeof(PyDictObject) + sizeof(mp->ma_table);
is the sizeof(mp->ma_table) counted twice?
2) Since functions list_sizeof and dict_sizeof return the allocated
size, including the over-allocation, should function string_sizeof not
include the sentinel null character?
3) Are tuples left out on purpose? If not, here is an implementation
for Objects/tupleobject.c:
....
static PyObject *
tuple_sizeof(PyTupleObject *v)
{
Py_ssize_t res;
res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) *
sizeof(void*);
return PyInt_FromSsize_t(res);
}
PyDoc_STRVAR(sizeof_doc,
"T.__sizeof__() -- size of T in bytes");
....
static PyMethodDef tuple_methods[] = {
{"__getnewargs__", (PyCFunction)tuple_getnewargs,
METH_NOARGS},
{"__sizeof__", (PyCFunction)tuple_sizeof, METH_NOARGS,
sizeof_doc},
....
/Jean Brouwers
|
msg68377 - (view) |
Author: Robert Schuppenies (schuppenies) *  |
Date: 2008-06-18 22:10 |
Jean Brouwers wrote:
> 1) In the first line of function dict_sizeof()
> + res = sizeof(PyDictObject) + sizeof(mp->ma_table);
> is the sizeof(mp->ma_table) counted twice?
Yes, you are right. I'll fix this.
> 2) Since functions list_sizeof and dict_sizeof return the allocated
> size, including the over-allocation, should function string_sizeof not
> include the sentinel null character?
Isn't this addressed by taking PyStringObject.ob_sval into account? It
is allocated with 1 char length and thus always included. If I
understand the creation of strings correctly, the corresponding memory
is always allocated with
PyObject_MALLOC(sizeof(PyStringObject) + size)
which should mean that the space for the null terminating character is
included in the sizeof(PyStringObject).
>
>
> 3) Are tuples left out on purpose?
No, that slipped the initial patch. I corrected in r64230.
> ....
> static PyObject *
> tuple_sizeof(PyTupleObject *v)
> {
> Py_ssize_t res;
>
> res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) *
> sizeof(void*);
> return PyInt_FromSsize_t(res);
> }
> ....
Your implementation is like the applied changes from me, with one
difference. The basicsize of a tuple is defined as
"sizeof(PyTupleObject) - sizeof(PyObject *)"
When a tuple's memory is allocated, the required space is computed
roughly like this
(typeobj)->tp_basicsize + (nitems)*(typeobj)->tp_itemsize
Thus, I understand the memory allocated by a tuple to be
res = PyTuple_Type.tp_basicsize + Py_SIZE(v) * sizeof(PyObject *);
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:34 | admin | set | github: 47147 |
2008-10-13 19:48:15 | jcea | set | nosy:
gvanrossum, loewis, georg.brandl, rhettinger, facundobatista, jcea, MrJean1, schuppenies |
2008-06-18 22:10:51 | schuppenies | set | messages:
+ msg68377 |
2008-06-18 19:49:19 | MrJean1 | set | nosy:
+ MrJean1 messages:
+ msg68372 |
2008-06-01 16:22:53 | schuppenies | set | status: open -> closed messages:
+ msg67595 |
2008-05-31 10:07:04 | loewis | set | assignee: gvanrossum -> schuppenies resolution: accepted messages:
+ msg67570 |
2008-05-29 13:09:45 | schuppenies | set | files:
+ sizeof.patch messages:
+ msg67489 |
2008-05-29 13:08:31 | schuppenies | set | files:
- sizeof.patch |
2008-05-29 08:11:26 | schuppenies | set | files:
+ sizeof.patch messages:
+ msg67481 |
2008-05-28 16:45:12 | jcea | set | nosy:
+ jcea |
2008-05-28 08:13:54 | schuppenies | set | messages:
+ msg67442 |
2008-05-28 07:48:16 | georg.brandl | set | messages:
+ msg67440 |
2008-05-28 07:35:57 | schuppenies | set | messages:
+ msg67438 |
2008-05-21 01:48:07 | facundobatista | set | nosy:
+ facundobatista |
2008-05-19 20:48:11 | loewis | set | messages:
+ msg67075 |
2008-05-19 15:09:37 | gvanrossum | set | messages:
+ msg67063 |
2008-05-17 21:04:17 | rhettinger | set | assignee: gvanrossum messages:
+ msg67016 nosy:
+ gvanrossum |
2008-05-17 19:18:13 | loewis | set | messages:
+ msg67011 |
2008-05-17 18:52:30 | georg.brandl | set | messages:
+ msg67009 |
2008-05-17 14:11:32 | loewis | set | messages:
+ msg66996 |
2008-05-17 13:55:04 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg66995 |
2008-05-17 13:46:48 | loewis | set | nosy:
+ loewis messages:
+ msg66994 |
2008-05-17 11:02:43 | georg.brandl | set | messages:
+ msg66992 |
2008-05-17 11:00:27 | schuppenies | set | messages:
+ msg66991 |
2008-05-17 10:50:22 | georg.brandl | set | nosy:
+ georg.brandl messages:
+ msg66990 |
2008-05-17 10:44:29 | schuppenies | create | |