Message 186045 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	eric.snow, lemburg, pitrou, serhiy.storchaka, vstinner
Date	2013-04-04.17:30:09
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<515DB89B.5000106@egenix.com>
In-reply-to	<1365094813.48.0.21403965853.issue17628@psf.upfronthosting.co.za>

Content
On 04.04.2013 19:00, Eric Snow wrote: > > Eric Snow added the comment: > >> Marc-Andre Lemburg added the comment: >> Same here. The heuristic may work for short strings that easily fit >> into the CPU cache, but as soon as you use it on longer strings, >> this will result in much slower comparisons. > > When testing both, would it help to test the end of the string before the beginning? I'd expect that be more likely to leave the beginning in the cache for any subsequent memcmp() call. Again: this depends a lot on what strings you are dealing with. If you are comparing strings that only vary in the first few characters, testing the last character first would not be ideal :-) Given that CPUs are optimized to read ahead in memory, it's always better to avoid jumping around too much when accessing memory. http://en.wikipedia.org/wiki/CPU_cache http://en.wikipedia.org/wiki/Locality_of_reference http://lwn.net/Articles/252125/ Ideally, you want to stay within a cache line, typically 64 bytes.

On 04.04.2013 19:00, Eric Snow wrote:
> 
> Eric Snow added the comment:
> 
>> Marc-Andre Lemburg added the comment:
>> Same here. The heuristic may work for short strings that easily fit
>> into the CPU cache, but as soon as you use it on longer strings,
>> this will result in much slower comparisons.
> 
> When testing both, would it help to test the end of the string before the beginning?  I'd expect that be more likely to leave the beginning in the cache for any subsequent memcmp() call.

Again: this depends a lot on what strings you are dealing with. If
you are comparing strings that only vary in the first few characters,
testing the last character first would not be ideal :-)

Given that CPUs are optimized to read ahead in memory, it's always
better to avoid jumping around too much when accessing memory.

http://en.wikipedia.org/wiki/CPU_cache
http://en.wikipedia.org/wiki/Locality_of_reference
http://lwn.net/Articles/252125/

Ideally, you want to stay within a cache line, typically 64 bytes.

History
Date	User	Action	Args
2013-04-04 17:30:09	lemburg	set	recipients: + lemburg, pitrou, vstinner, eric.snow, serhiy.storchaka
2013-04-04 17:30:09	lemburg	link	issue17628 messages
2013-04-04 17:30:09	lemburg	create