Message 173332 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	serhiy.storchaka, vstinner
Date	2012-10-19.12:36:05
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1350650166.87.0.27478619259.issue16286@psf.upfronthosting.co.za>
In-reply-to

Content
Attached patch optimize a==b and a!=b operators for bytes and str types of Python 3.4. For str, memcmp() is now always used, instead of a loop using PyUnicode_READ() (which is slow) for kind different than 1. For bytes, compare the first but also the last byte before calling memcmp(), instead of just comparing the first byte. Similar optimization was implemented in Py_UNICODE_MATCH(): changeset: 38242:0de9a789de39 branch: legacy-trunk user: Fredrik Lundh <fredrik@pythonware.com> date: Tue May 23 10:10:57 2006 +0000 files: Include/unicodeobject.h description: needforspeed: check first and last character before doing a full memcmp Initially I only wrote the patch to check the hash values before comparing content of the strings. -- I done some statistics tests. For a fresh Python interpreter, the hash values are only known in 7% cases (but when hashes are compared, they are quite always different, so the optimization is useful). When running "./python -m test test_os", hashes are known and different in 41.4%. After running 70 tests, hashes are known and different in 80%.

Attached patch optimize a==b and a!=b operators for bytes and str types of Python 3.4. For str, memcmp() is now always used, instead of a loop using PyUnicode_READ() (which is slow) for kind different than 1. For bytes, compare the first but also the last byte before calling memcmp(), instead of just comparing the first byte. Similar optimization was implemented in Py_UNICODE_MATCH():

changeset:   38242:0de9a789de39
branch:      legacy-trunk
user:        Fredrik Lundh <fredrik@pythonware.com>
date:        Tue May 23 10:10:57 2006 +0000
files:       Include/unicodeobject.h
description:
needforspeed: check first *and* last character before doing a full memcmp

Initially I only wrote the patch to check the hash values before comparing content of the strings.

--

I done some statistics tests. For a fresh Python interpreter, the hash values are only known in 7% cases (but when hashes are compared, they are quite always different, so the optimization is useful). When running "./python -m test test_os", hashes are known and different in 41.4%. After running 70 tests, hashes are known and different in 80%.

History
Date	User	Action	Args
2012-10-19 12:36:06	vstinner	set	recipients: + vstinner, serhiy.storchaka
2012-10-19 12:36:06	vstinner	set	messageid: <1350650166.87.0.27478619259.issue16286@psf.upfronthosting.co.za>
2012-10-19 12:36:06	vstinner	link	issue16286 messages
2012-10-19 12:36:06	vstinner	create