Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory footprint query #47147

Closed
schuppenies mannequin opened this issue May 17, 2008 · 21 comments
Closed

Add memory footprint query #47147

schuppenies mannequin opened this issue May 17, 2008 · 21 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@schuppenies
Copy link
Mannequin

schuppenies mannequin commented May 17, 2008

BPO 2898
Nosy @gvanrossum, @loewis, @birkenfeld, @rhettinger, @facundobatista, @jcea
Files
  • footprint.patch: Patch against 2.6 trunk, revision 63363
  • sizeof.patch: Patch against 2.6 trunk, revision 63363
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2008-06-01.16:22:53.669>
    created_at = <Date 2008-05-17.10:44:29.078>
    labels = ['interpreter-core', 'type-feature']
    title = 'Add memory footprint query'
    updated_at = <Date 2008-10-13.19:48:15.963>
    user = 'https://bugs.python.org/schuppenies'

    bugs.python.org fields:

    activity = <Date 2008-10-13.19:48:15.963>
    actor = 'jcea'
    assignee = 'schuppenies'
    closed = True
    closed_date = <Date 2008-06-01.16:22:53.669>
    closer = 'schuppenies'
    components = ['Interpreter Core']
    creation = <Date 2008-05-17.10:44:29.078>
    creator = 'schuppenies'
    dependencies = []
    files = ['10353', '10465']
    hgrepos = []
    issue_num = 2898
    keywords = ['patch']
    message_count = 21.0
    messages = ['66989', '66990', '66991', '66992', '66994', '66995', '66996', '67009', '67011', '67016', '67063', '67075', '67438', '67440', '67442', '67481', '67489', '67570', '67595', '68372', '68377']
    nosy_count = 8.0
    nosy_names = ['gvanrossum', 'loewis', 'georg.brandl', 'rhettinger', 'facundobatista', 'jcea', 'MrJean1', 'schuppenies']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue2898'
    versions = ['Python 2.6']

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 17, 2008

    I propose a patch which allows to query the memory footprint of an
    object. Calling 'footprint(o)', a python developer can retrieve the
    size of any python object. Only the size of the object itself will be
    returned, the size of any referenced objects will be ignored.

    The patch implements a generic function to compute the object
    size. This works in most, but a few cases. One of these exceptions is
    the dictionary with its particular table implementation. Such cases
    can be handled by implementing an optional method in C. This would
    also be the case for third-party implementations with unusual type
    definitions.

    One advantage with this approach is that the object size can be
    computed at the level an object is allocated, not requiring complex
    computations and considerations on higher levels.

    I am not completely happy with the name 'footprint', but think using
    'sizeof' would be confused with plain 'size', and 'memory_usage' was
    somewhat too long to be typed conveniently.

    Current test pass on linux32 and linux64, but the test suite is not
    complete, yet.

    This patch is part of my Google Summer of Code project on Python
    memory profiling
    (http://code.google.com/soc/2008/psf/appinfo.html?csaid=13F0E9C8B6E064EF).
    Also, this is my first patch, so please let me know where missed
    something, did not follow coding conventions, or made wrong
    assumptions.

    @schuppenies schuppenies mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels May 17, 2008
    @birkenfeld
    Copy link
    Member

    Can't you write this as a simple Python function using
    type.__basicsize__ and type.__itemsize__?

    In any case, if this is added somewhere it should not be a builtin. This
    operation is nowhere near the usefulness to be one.

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 17, 2008

    Can't you write this as a simple Python function using
    type.__basicsize__ and type.__itemsize__?

    Yes, it would be possible and has been done, e.g.
    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/546530. The
    problem is though, that it requires handling of all special cases
    externally. Any changes need to be addressed separately and unknown type
    definitions cannot be addressed at all. Also I figured the programmer
    implementing a type would know best about its size. Another point is
    different architectures which result in different object sizes.

    In any case, if this is added somewhere it should not be a builtin.

    What place would you consider to be appropriate?

    @birkenfeld
    Copy link
    Member

    Such implementation-specific things usually went into the sys module.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 17, 2008

    It's actually not possible, in general, to compute the memory
    consumption of an object using basicsize and itemsize. An example is the
    dictionary, where there is no way to find out how many slots are
    currently allocated.

    Even for the things such as lists where the formula
    basicsize+len*itemsize would be correct it may fail, e.g. a list reports
    its itemsize as zero, even though each list item consumes four bytes (on
    a 32-bit system).

    I don't really see a problem with calling it sizeof, so I would then
    propose sys.sizeof as the appropriate location.

    @rhettinger
    Copy link
    Contributor

    Proposals like this have been rejected in the past. Memory consumption
    is an evasive concept. Lists over-allocate space, there are freelists,
    there are immortal objects, the python memory allocator may hang-on to
    space thought to be available, the packing and alignment of structures
    varies across implementations, the system memory allocator may assign
    much larger chunks than are needed for a single object, and the memory
    may not be freed back to the system. Because of these issues, it is
    not that meaningful to say the object x consumes y bytes.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 17, 2008

    Proposals like this have been rejected in the past. Memory consumption
    is an evasive concept. Lists over-allocate space

    That issue is addressed in this patch.

    there are freelists,

    but they allocate just an upper bound.

    there are immortal objects, the python memory allocator may hang-on to
    space thought to be available

    These issues are orthogonal to the memory consumption of a single
    object.

    the packing and alignment of structures
    varies across implementations

    This is addressed in the current patch.

    the system memory allocator may assign
    much larger chunks than are needed for a single object

    While true in general, this is not true in practice - in particular,
    when objects get allocated through pymalloc.

    and the memory
    may not be freed back to the system. Because of these issues, it is
    not that meaningful to say the object x consumes y bytes.

    This is not true. It is meaningful to say that (and many that you
    noted are independent from such a statement, as they say things for
    the whole interpreter, not an individual object).

    The patch meets a real need, and is the minimum amount of code that
    actually *has* to be implemented in the virtual machine, to get
    a reasonable analysis of the total memory consumption. Please be
    practical here, not puristic.

    @birkenfeld
    Copy link
    Member

    Lists will need a custom tp_footprint then, too. Or, if we call it
    sizeof, the slot should be tp_sizeof. BTW, is a new slot necessary, or
    can it just be a type method called __sizeof__?

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 17, 2008

    Lists will need a custom tp_footprint then, too.

    True.

    BTW, is a new slot necessary, or
    can it just be a type method called __sizeof__?

    It wouldn't be a type method, but a regular method on the specific type,
    right?

    I think that would work as well.

    @rhettinger
    Copy link
    Contributor

    Guido, recently you've been opposed to adding more slots. Any opinions
    on this one? Also, is this something you want an additional builtin
    for?

    @gvanrossum
    Copy link
    Member

    I'm torn about the extra slot; I'd rather not add one, but I can't see
    how to make this flexible enough without one.

    It should definitely not be a built-in; the sys module is fine though
    (e.g. sys.getrefcount() lives there too).

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 19, 2008

    I'm torn about the extra slot; I'd rather not add one, but I can't see
    how to make this flexible enough without one.

    I think adding a default __sizeof__ implementation into object
    (basicsize + len()*itemsize), plus overriding that in
    subclasses, should do the trick.

    Not adding the default into object would cause an exception to be
    raised whenever sys.sizeof checks for __sizeof__, which is fairly
    expensive.

    Having to look __sizeof__ up in the class dictionary, and
    creating an argument list, is still fairly expensive (given that the
    application we have in mind will apply sizeof to all objects,
    repeatedly), however, this being a debugging facility, this overhead is
    probably ok.

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 28, 2008

    I tried to implement a magic method __sizeof__() for the type object
    which should be callable for type objects and type itself.
    But calling __sizeof__ results in an error message

    >>> type.__sizeof__()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: descriptor '__sizeof__' of 'type' object needs an argument

    Debugging it I found that type_getattro will (1) look for the
    attribute in the metatype, (2) look in tp_dict of this type, and (3)
    use the descriptor from the metatype.

    I actually want it to perform (3), but since type is its own
    metatype (2) will be triggered. This then results in the need for an
    argument. The same behavior occurs for all type instances, i.e. classes.

    Is my understanding correct? How would it be possible to invoke
    __sizeof__() on the type 'type' and not on the object 'type'?

    My first approach did the same for object, that is a magic __sizeof__()
    method linked to object, but it gets ignored when invoked on classes or
    types.
    Now from my understanding everything is an object, thus also
    classes and types. isinstance seems to agree with me

    >>> >>> isinstance(int, object)
    True

    Any suggestions on that?

    thanks,
    robert

    @birkenfeld
    Copy link
    Member

    You probably just need to make the method a class method -- see METH_CLASS.

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 28, 2008

    thanks, that did the trick.

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 29, 2008

    The attached patch implements the sizeof functionality as a sys module
    function. __sizeof__ is implemented by object as a instance method, by
    type as a class method as well as by types which's size cannot be
    computed from basicsize, itemsize and ob_size.
    sys.getsizeof() has some work-arounds to deal with type instances and
    old-style classes.

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented May 29, 2008

    Nick Coghlan helped me to clear my 'metaclass confusion' so here is a
    patch without an additional __sizeof__ for type objects.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 31, 2008

    The patch looks fine to me, please apply. Don't forget to add a
    Misc/NEWS entry.

    @loewis loewis mannequin assigned schuppenies and unassigned gvanrossum May 31, 2008
    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented Jun 1, 2008

    Applied in r63856.

    @schuppenies schuppenies mannequin closed this as completed Jun 1, 2008
    @MrJean1
    Copy link
    Mannequin

    MrJean1 mannequin commented Jun 18, 2008

    Three questions on the sizeof.patch:

    1. In the first line of function dict_sizeof()

    + res = sizeof(PyDictObject) + sizeof(mp->ma_table);

    is the sizeof(mp->ma_table) counted twice?

    1. Since functions list_sizeof and dict_sizeof return the allocated
      size, including the over-allocation, should function string_sizeof not
      include the sentinel null character?

    2. Are tuples left out on purpose? If not, here is an implementation
      for Objects/tupleobject.c:

    ....
    static PyObject *
    tuple_sizeof(PyTupleObject *v)
    {
    	Py_ssize_t res;
    
    	res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) * 
    sizeof(void*);
    	return PyInt_FromSsize_t(res);
    }

    PyDoc_STRVAR(sizeof_doc,
    "T.__sizeof__() -- size of T in bytes");

    ....
    static PyMethodDef tuple_methods[] = {
    {"__getnewargs__", (PyCFunction)tuple_getnewargs,
    METH_NOARGS},
    {"__sizeof__", (PyCFunction)tuple_sizeof, METH_NOARGS,
    sizeof_doc},
    ....

    /Jean Brouwers

    @schuppenies
    Copy link
    Mannequin Author

    schuppenies mannequin commented Jun 18, 2008

    Jean Brouwers wrote:

    1. In the first line of function dict_sizeof()
    • res = sizeof(PyDictObject) + sizeof(mp->ma_table);
      is the sizeof(mp->ma_table) counted twice?

    Yes, you are right. I'll fix this.

    1. Since functions list_sizeof and dict_sizeof return the allocated
      size, including the over-allocation, should function string_sizeof not
      include the sentinel null character?

    Isn't this addressed by taking PyStringObject.ob_sval into account? It
    is allocated with 1 char length and thus always included. If I
    understand the creation of strings correctly, the corresponding memory
    is always allocated with

    PyObject_MALLOC(sizeof(PyStringObject) + size)

    which should mean that the space for the null terminating character is
    included in the sizeof(PyStringObject).

    1. Are tuples left out on purpose?

    No, that slipped the initial patch. I corrected in r64230.

    ....
    static PyObject *
    tuple_sizeof(PyTupleObject *v)
    {
    Py_ssize_t res;

    res = _PyObject_SIZE(&PyTuple_Type) + Py_SIZE(v) *
    sizeof(void*);
    return PyInt_FromSsize_t(res);
    }
    ....

    Your implementation is like the applied changes from me, with one
    difference. The basicsize of a tuple is defined as
    "sizeof(PyTupleObject) - sizeof(PyObject *)"

    When a tuple's memory is allocated, the required space is computed
    roughly like this

    (typeobj)->tp_basicsize + (nitems)*(typeobj)->tp_itemsize

    Thus, I understand the memory allocated by a tuple to be

    res = PyTuple_Type.tp_basicsize + Py_SIZE(v) * sizeof(PyObject *);

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants