Message279227
> Isn't this already implemented?
No.
>>> class A:
pass
>>> d = dict.fromkeys('abcdefghi')
>>> a = A()
>>> a.__dict__.update(d)
>>> b = A()
>>> b.__dict__.update(d)
>>> import sys
>>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
[368, 648, 648]
>>> c = A()
>>> c.__dict__.update(d)
>>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
[368, 648, 648, 648]
There is no benefit reported for key-sharing. Even if you make a thousand of these instances, the size reported is the same. Here is the relevant code:
_PyDict_SizeOf(PyDictObject *mp)
{
Py_ssize_t size, usable, res;
size = DK_SIZE(mp->ma_keys);
usable = USABLE_FRACTION(size);
res = _PyObject_SIZE(Py_TYPE(mp));
if (mp->ma_values)
res += usable * sizeof(PyObject*);
/* If the dictionary is split, the keys portion is accounted-for
in the type object. */
if (mp->ma_keys->dk_refcnt == 1)
res += (sizeof(PyDictKeysObject)
- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
+ DK_IXSIZE(mp->ma_keys) * size
+ sizeof(PyDictKeyEntry) * usable);
return res;
}
It looks like the fixed overhead is included for every instance of a split-dictionary. Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances):
res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances;
Perhaps use ceiling division:
res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances); |
|
Date |
User |
Action |
Args |
2016-10-22 22:16:58 | rhettinger | set | recipients:
+ rhettinger, serhiy.storchaka, xiang.zhang |
2016-10-22 22:16:58 | rhettinger | set | messageid: <1477174618.9.0.903755499844.issue28508@psf.upfronthosting.co.za> |
2016-10-22 22:16:58 | rhettinger | link | issue28508 messages |
2016-10-22 22:16:58 | rhettinger | create | |
|