This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author tim.peters
Recipients lemburg, luis@luispedro.org, methane, rhettinger, serhiy.storchaka, terry.reedy, tim.peters, twouters, vstinner
Date 2018-02-17.02:22:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1518834146.48.0.467229070634.issue32846@psf.upfronthosting.co.za>
In-reply-to
Content
> Surprisingly, deleting a very large set takes much longer than creating it.

Luis, that's not surprising ;-)  When you create it, it's mostly the case that there's a vast chunk of raw memory from which many pieces are passed out in address order (to hold all the newly created Python objects).  Memory access is thus mostly sequential.  But when you delete it, that vast chunk of once-raw memory is visited in essentially random order (string hashes impose a pseudo-random order on where (pointers to) string objects are stored within a set's vector), defeating all the hardware features that greatly benefit from sequential access.

More precisely, the set's internal vector is visited sequentially during deletion, but the string objects the pointers point _at_ are all over the place.  Even if nothing is swapped to disk, it's likely that visiting a string object during deletion will miss on all cache levels, falling back to (much slower) RAM.  Note that all the string objects must be visited during set deletion, in order to decrement their reference counts.
History
Date User Action Args
2018-02-17 02:22:26tim.peterssetrecipients: + tim.peters, lemburg, twouters, rhettinger, terry.reedy, vstinner, luis@luispedro.org, methane, serhiy.storchaka
2018-02-17 02:22:26tim.peterssetmessageid: <1518834146.48.0.467229070634.issue32846@psf.upfronthosting.co.za>
2018-02-17 02:22:26tim.peterslinkissue32846 messages
2018-02-17 02:22:25tim.peterscreate