This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients serhiy.storchaka, vstinner
Date 2012-10-19.12:36:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1350650166.87.0.27478619259.issue16286@psf.upfronthosting.co.za>
In-reply-to
Content
Attached patch optimize a==b and a!=b operators for bytes and str types of Python 3.4. For str, memcmp() is now always used, instead of a loop using PyUnicode_READ() (which is slow) for kind different than 1. For bytes, compare the first but also the last byte before calling memcmp(), instead of just comparing the first byte. Similar optimization was implemented in Py_UNICODE_MATCH():

changeset:   38242:0de9a789de39
branch:      legacy-trunk
user:        Fredrik Lundh <fredrik@pythonware.com>
date:        Tue May 23 10:10:57 2006 +0000
files:       Include/unicodeobject.h
description:
needforspeed: check first *and* last character before doing a full memcmp

Initially I only wrote the patch to check the hash values before comparing content of the strings.

--

I done some statistics tests. For a fresh Python interpreter, the hash values are only known in 7% cases (but when hashes are compared, they are quite always different, so the optimization is useful). When running "./python -m test test_os", hashes are known and different in 41.4%. After running 70 tests, hashes are known and different in 80%.
History
Date User Action Args
2012-10-19 12:36:06vstinnersetrecipients: + vstinner, serhiy.storchaka
2012-10-19 12:36:06vstinnersetmessageid: <1350650166.87.0.27478619259.issue16286@psf.upfronthosting.co.za>
2012-10-19 12:36:06vstinnerlinkissue16286 messages
2012-10-19 12:36:06vstinnercreate