This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author rhettinger
Recipients rhettinger
Date 2014-12-27.10:50:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1419677411.19.0.303130627071.issue23119@psf.upfronthosting.co.za>
In-reply-to
Content
This tracker item is to record experiments with removing unicode specialization code from set objects and run timings to determine the performance benefits or losses from those specializations.

* Removes the set_lookkey_unicode() function and the attendant so->lookup indirections.  That saves 60 lines of code.  On each lookup, it saves one indirection for the lookup dispatch, but in the case of unicode only tables, it costs an additional indirection through the abstract API for PyObject_RichCompareBool.

* Removes the specialization code in add, discard, and contains functions to check for a unicode key with an already computed hash value.  This saves a type check (cheap), a hash field check, and a nine lines of code.  In the cast where the hash value would have already been computed, it costs a call to PyObject_Hash (which has an indirection, but otherwise does the same field test that we are doing).  The working hypothesis is that this specialization code saves only a little in cases where it applies and adds a little to all the cases where it does not apply.  (Note, the use cases for sets are less likely than dicts to be looking up strings whose hash value has already been computed.)

----------------------

Here are some initial timings for the first patch.  It seems to show that intersection benefits slightly and that set creation time is unaffected.

$ ./time_suite.sh 
100000 loops, best of 3: 14.9 usec per loop
100000 loops, best of 3: 15.3 usec per loop
1000000 loops, best of 3: 1.17 usec per loop
1000000 loops, best of 3: 1.13 usec per loop
10000 loops, best of 3: 24.9 usec per loop
10000 loops, best of 3: 24.2 usec per loop

$ ./time_suite.sh 
100000 loops, best of 3: 14.7 usec per loop
100000 loops, best of 3: 14.6 usec per loop
1000000 loops, best of 3: 1.16 usec per loop
1000000 loops, best of 3: 1.07 usec per loop
10000 loops, best of 3: 23.1 usec per loop
10000 loops, best of 3: 23.4 usec per loop

$ ./time_suite.sh 
100000 loops, best of 3: 14.5 usec per loop
100000 loops, best of 3: 14.5 usec per loop
1000000 loops, best of 3: 1.16 usec per loop
1000000 loops, best of 3: 1.17 usec per loop
10000 loops, best of 3: 22.5 usec per loop
10000 loops, best of 3: 22 usec per loop
History
Date User Action Args
2014-12-27 10:50:11rhettingersetrecipients: + rhettinger
2014-12-27 10:50:11rhettingersetmessageid: <1419677411.19.0.303130627071.issue23119@psf.upfronthosting.co.za>
2014-12-27 10:50:10rhettingerlinkissue23119 messages
2014-12-27 10:50:09rhettingercreate