This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author lemburg
Recipients lemburg, pitrou, serhiy.storchaka, vstinner
Date 2013-04-04.09:09:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <515D4349.4020904@egenix.com>
In-reply-to <CAMpsgwbhLXPqwXQkWc2m0CevDgkg8sagdQo82iXHpi2XHNv1fw@mail.gmail.com>
Content
On 04.04.2013 10:33, STINNER Victor wrote:
>>> I don't understand why the patch makes the comparaison much slower,
>>> since most time is supposed to be spend in memcmp()?
>>
>> Because reading the last character evicts useful data from the CPU cache, just before memcmp() reads it again from memory?
>>
>> In other words, I'm not convinced this is a useful heuristic.

Same here. The heuristic may work for short strings that easily fit
into the CPU cache, but as soon as you use it on longer strings,
this will result in much slower comparisons.

Whether this results in a speedup or not also depends a lot
on the domain of where you need to run comparisons, e.g. if you have
run the heuristic on Python's special method names (such as "__init__")
it won't give you any benefit. OTOH, it's easy to construct strings
that benefit a lot from it :-)

Something that typically works well in practice is to inline
the comparison of the first few characters and then call memcmp()
on the remaining ones. This avoids cache corruption and safes
a few cycles setup costs for memcmp() for short strings.
History
Date User Action Args
2013-04-04 09:09:36lemburgsetrecipients: + lemburg, pitrou, vstinner, serhiy.storchaka
2013-04-04 09:09:36lemburglinkissue17628 messages
2013-04-04 09:09:35lemburgcreate