Author David Su2
Recipients David Su2
Date 2016-07-19.17:51:47
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1468950707.74.0.640405379684.issue27575@psf.upfronthosting.co.za>
In-reply-to
Content
In dictobject.c, the function dictviews_and performs a very expensive set creation operation on the keys of the self dict object:

From Python 2.7:
...
static PyObject*
dictviews_and(PyObject* self, PyObject *other)
{
    PyObject *result = PySet_New(self);
...
    tmp = PyObject_CallMethod(result, "intersection_update", "(O)", other);
...

Similar logic can be found in the Python 3 code as well.


For large dicts (~10 000 000 keys), it takes a significant amount of time to copy the keys into a new set, and also wastes a lot of memory. This issue is amplified when the intersection operation is performed many times.

This implementation uses O(len(self)) time and space, while it is expected to use O(min(len(self), len(other)) time and space.


Solution 1: Manually copy the keys of the dict into a set, and perform all intersection operations on that set. However, still wastes lots of memory. 

Solution 2 (Recommended): Port the intersection logic from the set class to the dict view class.
History
Date User Action Args
2016-07-19 17:51:47David Su2setrecipients: + David Su2
2016-07-19 17:51:47David Su2setmessageid: <1468950707.74.0.640405379684.issue27575@psf.upfronthosting.co.za>
2016-07-19 17:51:47David Su2linkissue27575 messages
2016-07-19 17:51:47David Su2create