classification
Title: Need way to expose incremental size of key sharing dicts
Type: Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: rhettinger, serhiy.storchaka, xiang.zhang
Priority: normal Keywords:

Created on 2016-10-22 17:35 by rhettinger, last changed 2016-10-22 23:15 by rhettinger. This issue is now closed.

Messages (8)
msg279207 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-10-22 17:35
In many Python programs much of the memory utilization is due to having many instances of the same object.  We have key-sharing dicts that reduce the cost by storing only in the incremental values.  It would be nice to have visibility to the savings.

One possible way to do this is to have sys.getsizeof(d) report only the incremental space.  That would let users make reasonable memory estimates in the form of n_instances * sizeof(vars(inst)).
msg279208 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-22 18:04
Isn't this already implemented?
msg279209 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-22 18:06
>>> class C:
...     def __init__(self):
...         for i in range(682):
...             setattr(self, 'a%d'%i, None)
... 
>>> sys.getsizeof(C().__dict__) / len(C().__dict__)
4.058651026392962
msg279211 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-22 18:25
> Isn't this already implemented?

Get the same question. dict.__sizeof__ can identify shared dicts.
msg279227 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-10-22 22:16
> Isn't this already implemented?

No.

    >>> class A:
            pass

    >>> d = dict.fromkeys('abcdefghi')
    >>> a = A()
    >>> a.__dict__.update(d)
    >>> b = A()
    >>> b.__dict__.update(d)
    >>> import sys
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
    [368, 648, 648]
    >>> c = A()
    >>> c.__dict__.update(d)
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
    [368, 648, 648, 648]

There is no benefit reported for key-sharing.  Even if you make a thousand of these instances, the size reported is the same.  Here is the relevant code:

    _PyDict_SizeOf(PyDictObject *mp)
    {
        Py_ssize_t size, usable, res;

        size = DK_SIZE(mp->ma_keys);
        usable = USABLE_FRACTION(size);

        res = _PyObject_SIZE(Py_TYPE(mp));
        if (mp->ma_values)
            res += usable * sizeof(PyObject*);
        /* If the dictionary is split, the keys portion is accounted-for
           in the type object. */
        if (mp->ma_keys->dk_refcnt == 1)
            res += (sizeof(PyDictKeysObject)
                    - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                    + DK_IXSIZE(mp->ma_keys) * size
                    + sizeof(PyDictKeyEntry) * usable);
        return res;
    }

It looks like the fixed overhead is included for every instance of a split-dictionary.   Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances):

     res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances;

Perhaps use ceiling division:

     res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances);
msg279229 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-10-22 23:00
Hmm, seems no dict here is shared-key dict.
msg279230 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-10-22 23:13
> Hmm, seems no dict here is shared-key dict.

Yes.  That seems to be the case.  Apparently, doing an update() to the inst dict cause it to recombine.
msg279231 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-10-22 23:15
>>> from sys import getsizeof
>>> class A:
	def __init__(self, a, b, c, d, e, f):
		self.a = a
		self.b = b
		self.c = c
		self.d = d
		self.e = e
		self.f = f
		
>>> a = A(10, 20, 30, 40, 50, 60)
>>> b = A(10, 20, 30, 40, 50, 60)
>>> c = A(10, 20, 30, 40, 50, 60)
>>> d = A(10, 20, 30, 40, 50, 60)
>>> [getsizeof(vars(inst)) for inst in [a, b, c, d]]
[152, 152, 152, 152]
>>> [getsizeof(dict(vars(inst))) for inst in [a, b, c, d]]
[368, 368, 368, 368]
History
Date User Action Args
2016-10-22 23:15:16rhettingersetmessages: + msg279231
2016-10-22 23:13:20rhettingersetstatus: open -> closed
resolution: not a bug
messages: + msg279230
2016-10-22 23:00:11serhiy.storchakasetmessages: + msg279229
2016-10-22 22:16:58rhettingersetmessages: + msg279227
2016-10-22 18:25:43xiang.zhangsetnosy: + xiang.zhang
messages: + msg279211
2016-10-22 18:06:57serhiy.storchakasetmessages: + msg279209
2016-10-22 18:04:53serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg279208
2016-10-22 17:35:42rhettingercreate