Message 279227 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	rhettinger
Recipients	rhettinger, serhiy.storchaka, xiang.zhang
Date	2016-10-22.22:16:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1477174618.9.0.903755499844.issue28508@psf.upfronthosting.co.za>
In-reply-to

Content
> Isn't this already implemented? No. >>> class A: pass >>> d = dict.fromkeys('abcdefghi') >>> a = A() >>> a.__dict__.update(d) >>> b = A() >>> b.__dict__.update(d) >>> import sys >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]] [368, 648, 648] >>> c = A() >>> c.__dict__.update(d) >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]] [368, 648, 648, 648] There is no benefit reported for key-sharing. Even if you make a thousand of these instances, the size reported is the same. Here is the relevant code: _PyDict_SizeOf(PyDictObject mp) { Py_ssize_t size, usable, res; size = DK_SIZE(mp->ma_keys); usable = USABLE_FRACTION(size); res = _PyObject_SIZE(Py_TYPE(mp)); if (mp->ma_values) res += usable sizeof(PyObject); / If the dictionary is split, the keys portion is accounted-for in the type object. / if (mp->ma_keys->dk_refcnt == 1) res += (sizeof(PyDictKeysObject) - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices) + DK_IXSIZE(mp->ma_keys) size + sizeof(PyDictKeyEntry) * usable); return res; } It looks like the fixed overhead is included for every instance of a split-dictionary. Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances): res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances; Perhaps use ceiling division: res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances);

> Isn't this already implemented?

No.

    >>> class A:
            pass

    >>> d = dict.fromkeys('abcdefghi')
    >>> a = A()
    >>> a.__dict__.update(d)
    >>> b = A()
    >>> b.__dict__.update(d)
    >>> import sys
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b)]]
    [368, 648, 648]
    >>> c = A()
    >>> c.__dict__.update(d)
    >>> [sys.getsizeof(m) for m in [d, vars(a), vars(b), vars(c)]]
    [368, 648, 648, 648]

There is no benefit reported for key-sharing.  Even if you make a thousand of these instances, the size reported is the same.  Here is the relevant code:

    _PyDict_SizeOf(PyDictObject *mp)
    {
        Py_ssize_t size, usable, res;

        size = DK_SIZE(mp->ma_keys);
        usable = USABLE_FRACTION(size);

        res = _PyObject_SIZE(Py_TYPE(mp));
        if (mp->ma_values)
            res += usable * sizeof(PyObject*);
        /* If the dictionary is split, the keys portion is accounted-for
           in the type object. */
        if (mp->ma_keys->dk_refcnt == 1)
            res += (sizeof(PyDictKeysObject)
                    - Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)
                    + DK_IXSIZE(mp->ma_keys) * size
                    + sizeof(PyDictKeyEntry) * usable);
        return res;
    }

It looks like the fixed overhead is included for every instance of a split-dictionary.   Instead, it might make sense to take the fixed overhead and divide it by the number of instances sharing the keys (averaging the overhead across the multiple shared instances):

     res = _PyObject_SIZE(Py_TYPE(mp)) / num_instances;

Perhaps use ceiling division:

     res = -(- _PyObject_SIZE(Py_TYPE(mp)) / num_instances);

History
Date	User	Action	Args
2016-10-22 22:16:58	rhettinger	set	recipients: + rhettinger, serhiy.storchaka, xiang.zhang
2016-10-22 22:16:58	rhettinger	set	messageid: <1477174618.9.0.903755499844.issue28508@psf.upfronthosting.co.za>
2016-10-22 22:16:58	rhettinger	link	issue28508 messages
2016-10-22 22:16:58	rhettinger	create