Message 186016 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	lemburg, pitrou, serhiy.storchaka, vstinner
Date	2013-04-04.09:09:35
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<515D4349.4020904@egenix.com>
In-reply-to	<CAMpsgwbhLXPqwXQkWc2m0CevDgkg8sagdQo82iXHpi2XHNv1fw@mail.gmail.com>

Content
On 04.04.2013 10:33, STINNER Victor wrote: >>> I don't understand why the patch makes the comparaison much slower, >>> since most time is supposed to be spend in memcmp()? >> >> Because reading the last character evicts useful data from the CPU cache, just before memcmp() reads it again from memory? >> >> In other words, I'm not convinced this is a useful heuristic. Same here. The heuristic may work for short strings that easily fit into the CPU cache, but as soon as you use it on longer strings, this will result in much slower comparisons. Whether this results in a speedup or not also depends a lot on the domain of where you need to run comparisons, e.g. if you have run the heuristic on Python's special method names (such as "__init__") it won't give you any benefit. OTOH, it's easy to construct strings that benefit a lot from it :-) Something that typically works well in practice is to inline the comparison of the first few characters and then call memcmp() on the remaining ones. This avoids cache corruption and safes a few cycles setup costs for memcmp() for short strings.

On 04.04.2013 10:33, STINNER Victor wrote:
>>> I don't understand why the patch makes the comparaison much slower,
>>> since most time is supposed to be spend in memcmp()?
>>
>> Because reading the last character evicts useful data from the CPU cache, just before memcmp() reads it again from memory?
>>
>> In other words, I'm not convinced this is a useful heuristic.

Same here. The heuristic may work for short strings that easily fit
into the CPU cache, but as soon as you use it on longer strings,
this will result in much slower comparisons.

Whether this results in a speedup or not also depends a lot
on the domain of where you need to run comparisons, e.g. if you have
run the heuristic on Python's special method names (such as "__init__")
it won't give you any benefit. OTOH, it's easy to construct strings
that benefit a lot from it :-)

Something that typically works well in practice is to inline
the comparison of the first few characters and then call memcmp()
on the remaining ones. This avoids cache corruption and safes
a few cycles setup costs for memcmp() for short strings.

History
Date	User	Action	Args
2013-04-04 09:09:36	lemburg	set	recipients: + lemburg, pitrou, vstinner, serhiy.storchaka
2013-04-04 09:09:36	lemburg	link	issue17628 messages
2013-04-04 09:09:35	lemburg	create