This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients lemburg, methane, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, twouters, vstinner
Date 2018-02-15.16:53:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518713626.47.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
> In former examples (list, ordered dict) objects are iterated and deleted 
>in order of creation. But in the set of strings items are deleted in
> non-predicable random order. I suppose the non-linear effect is 
> related to memory management.

That makes sense.  We're likely seeing an artifact of a favorable correspondence between the container pointer order and the order that the referred-to objects were created in memory in this somewhat non-representative test case.  That is why lists lose the favorable timing when the data is shuffled.

If the test bench creates all the referred-to objects in consecutive memory locations, then any container accessing those objects consecutively will benefit from spatial cache locality.  (i.e. If there is more than one datum per cache line, the second read is virtually free.  Likewise, memory controller read-ahead makes a long series of consecutive accesses cheaper.)

It would be nice is the timing were run on strings instead of integers.  Ints are somewhat weird in that the hashes of consecutive integers are themselves consecutive, you can fit two ints in one cache line. 

I'm not sure whether it applies here, but there may also be a branch-prediction effect as well:  https://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array

FWIW, the deallocation routine for sets isn't doing anything special.  It just calls Py_DECREF in a loop:

static void
set_dealloc(PySetObject *so)
{
    setentry *entry;
    Py_ssize_t used = so->used;

    /* bpo-31095: UnTrack is needed before calling any callbacks */
    PyObject_GC_UnTrack(so);
    Py_TRASHCAN_SAFE_BEGIN(so)
    if (so->weakreflist != NULL)
        PyObject_ClearWeakRefs((PyObject *) so);

    for (entry = so->table; used > 0; entry++) {
        if (entry->key && entry->key != dummy) {
                used--;
                Py_DECREF(entry->key);
        }
    }
    if (so->table != so->smalltable)
        PyMem_DEL(so->table);
    Py_TYPE(so)->tp_free(so);
    Py_TRASHCAN_SAFE_END(so)
}
History
Date User Action Args
2018-02-15 16:53:46rhettingersetrecipients: + rhettinger, lemburg, tim.peters, twouters, terry.reedy, vstinner, methane, serhiy.storchaka
2018-02-15 16:53:46rhettingersetmessageid: <1518713626.47.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-15 16:53:46rhettingerlinkissue32846 messages
2018-02-15 16:53:46rhettingercreate