This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients ezio.melotti, lemburg, pitrou, rhettinger, serhiy.storchaka, vstinner
Date 2015-01-09.09:04:44
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <54AF99A6.20803@egenix.com>
In-reply-to <1420792400.49.0.0632173196168.issue23119@psf.upfronthosting.co.za>
Content
On 09.01.2015 09:33, Raymond Hettinger wrote:
> 
> I'm withdrawing this one. After more work trying many timings on multiple compilers and various sizes and kinds of datasets, it appears that the unicode specialization is still worth it.  
> 
> The cost of the lookup indirection appears to be completely insignificant (i.e. doesn't harm the non-unicode case) while the benefits of the unicode specialized lookup does have measurable benefits in the use case of deduping an iterable of strings.

Thanks, Raymond, for the additional testing :-)

I did a grep over the Python C source code and it seems that sets are
only used by Python/symtable.c for anything mildly performance
relevant (which IIRC is used by the byte code compiler) -
and those sets have Unicode strings as members.

The stdlib uses sets with both Unicode strings and integers
as members. From looking at the grep hits, it seems that Unicode
strings are more commonly used than integers in the stdlib
as set members, e.g. for method names, module names and character
sets.
History
Date User Action Args
2015-01-09 09:04:45lemburgsetrecipients: + lemburg, rhettinger, pitrou, vstinner, ezio.melotti, serhiy.storchaka
2015-01-09 09:04:45lemburglinkissue23119 messages
2015-01-09 09:04:44lemburgcreate