Message 185956 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	pitrou, serhiy.storchaka, vstinner
Date	2013-04-03.21:29:12
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1365024552.84.0.0461498193244.issue17628@psf.upfronthosting.co.za>
In-reply-to

Content
In Python 3.4, str==str is implemented by calling memcmp(). unicode_eq() function, used by dict and set types, checks the first byte before calling memcmp(). bytes==bytes uses the same check. Py_UNICODE_MATCH macro checks the first and last character before calling memcmp() since this commit: --- changeset: 38242:0de9a789de39 branch: legacy-trunk user: Fredrik Lundh <fredrik@pythonware.com> date: Tue May 23 10:10:57 2006 +0000 files: Include/unicodeobject.h description: needforspeed: check first and last character before doing a full memcmp --- Attached patch changes str==str to check the first and last character before calling memcmp(). It might reduce the overhead of a C function call, but it is much faster when comparing two different strings of the same length with a common prefix (but a different suffix). The patch merges also unicode_compare_eq() and unicode_eq() to use the same code for str, dict and set. We may use the same optimization on byte strings. See also #16321.

In Python 3.4, str==str is implemented by calling memcmp().

unicode_eq() function, used by dict and set types, checks the first byte before calling memcmp(). bytes==bytes uses the same check.

Py_UNICODE_MATCH macro checks the first *and* last character before calling memcmp() since this commit:
---
changeset:   38242:0de9a789de39
branch:      legacy-trunk
user:        Fredrik Lundh <fredrik@pythonware.com>
date:        Tue May 23 10:10:57 2006 +0000
files:       Include/unicodeobject.h
description:
needforspeed: check first *and* last character before doing a full memcmp
---

Attached patch changes str==str to check the first and last character before calling memcmp(). It might reduce the overhead of a C function call, but it is much faster when comparing two different strings of the same length with a common prefix (but a different suffix).

The patch merges also unicode_compare_eq() and unicode_eq() to use the same code for str, dict and set.

We may use the same optimization on byte strings.

See also #16321.

History
Date	User	Action	Args
2013-04-03 21:29:12	vstinner	set	recipients: + vstinner, pitrou, serhiy.storchaka
2013-04-03 21:29:12	vstinner	set	messageid: <1365024552.84.0.0461498193244.issue17628@psf.upfronthosting.co.za>
2013-04-03 21:29:12	vstinner	link	issue17628 messages
2013-04-03 21:29:12	vstinner	create