Message 222597 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	josh.r
Recipients	BreamoreBoy, berker.peksag, cool-RR, eric.araujo, ezio.melotti, josh.r, r.david.murray, rhettinger, scoder, serhiy.storchaka, terry.reedy
Date	2014-07-09.02:16:19
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1404872180.31.0.951312298453.issue21911@psf.upfronthosting.co.za>
In-reply-to

Content
Looking at a single lookup performed over and over isn't going to get you a very good benchmark. If your keys are constantly reused, most of the losses won't show themselves. A more fair comparison I've used before is the difference between using the bytes object produced by bytes.maketrans as the mapping object for str.translate vs. using the dictionary produced by str.maketrans. That gets you the dynamically generated lookups that don't hit the dict optimizations for repeatedly looking up the same key, don't predictably access the same memory that never leaves the CPU cache, etc. Check the timing data I submitted with #21118; the exact same translation applied to the same input strings, with the only difference being whether the table is bytes or dict, takes nearly twice as long using a dict as it does using a bytes object. And the bytes object isn't actually being used efficiently here; str.translate isn't optimized for the buffer protocol or anything, so it's constantly retrieving the cached small ints; a tuple might be even faster by avoiding that minor additional cost.

Looking at a single lookup performed over and over isn't going to get you a very good benchmark. If your keys are constantly reused, most of the losses won't show themselves. A more fair comparison I've used before is the difference between using the bytes object produced by bytes.maketrans as the mapping object for str.translate vs. using the dictionary produced by str.maketrans. That gets you the dynamically generated lookups that don't hit the dict optimizations for repeatedly looking up the same key, don't predictably access the same memory that never leaves the CPU cache, etc.

Check the timing data I submitted with #21118; the exact same translation applied to the same input strings, with the only difference being whether the table is bytes or dict, takes nearly twice as long using a dict as it does using a bytes object. And the bytes object isn't actually being used efficiently here; str.translate isn't optimized for the buffer protocol or anything, so it's constantly retrieving the cached small ints; a tuple might be even faster by avoiding that minor additional cost.

History
Date	User	Action	Args
2014-07-09 02:16:20	josh.r	set	recipients: + josh.r, rhettinger, terry.reedy, scoder, ezio.melotti, eric.araujo, r.david.murray, cool-RR, BreamoreBoy, berker.peksag, serhiy.storchaka
2014-07-09 02:16:20	josh.r	set	messageid: <1404872180.31.0.951312298453.issue21911@psf.upfronthosting.co.za>
2014-07-09 02:16:20	josh.r	link	issue21911 messages
2014-07-09 02:16:19	josh.r	create