Issue 13279: Add memcmp into unicode_compare for optimizing comparisons

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/57488

classification

Title:	Add memcmp into unicode_compare for optimizing comparisons
Type:	performance	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.3

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:		Nosy List:	RichIsMyName, asmodai, ezio.melotti, loewis, petri.lehtinen, pitrou, rhettinger, scoder, vstinner
Priority:	normal	Keywords:	patch

Created on 2011-10-27 17:52 by RichIsMyName, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unicode_with_memcmp.patch	RichIsMyName, 2011-10-27 17:58	Patch for unicode_compare changes, plus added tests in test_unicode.py	review
unicode_with_memcmp_and_ucs_specialization.patch	RichIsMyName, 2011-10-31 19:03		review

Messages (12)
msg146508 - (view)	Author: Richard Saunders (RichIsMyName)	Date: 2011-10-27 17:52
In discussions of memcmp performance, (http://www.picklingtools.com/study.pdf) it was noted how well Python 2.7 can take advantage of faster memcmps (indeed, the rich comparisons are all memcmp calls). There have been some discussion on python-dev@python.org as well. With unicode and Python 3.3 (and anyPython 3.x) there are a few places we could call memcmp to make string comparisons faster, but they are not completely trivial. Basically, if the unicode strings are "1 byte kind", then memcmp can be used almost as is. If the unicode strings are the same kind, they can at least use memcmp to compare for equality or inequality. There is also a minor optimization laying in unicode_compare: if you are comparing two strings for equality/inequality, there is no reason to look at the entire string if the lengths are different. These 3 minor optimizations can make unicode_compare faster.
msg146511 - (view)	Author: Richard Saunders (RichIsMyName)	Date: 2011-10-27 17:58
This is a potential patch: I believe it follows the C-style of PEP 7 There is a test as well, testing 1 and 2 byte kinds. I have run it through the python tests and have added no new breakages (there were some tests that failed, but they failed with and without the patch)
msg146540 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-10-28 08:52
I would be nice to have a third path for inegality with kind1==kind2, something like: else if (kind1 == PyUnicode_2BYTE_KIND && kind2 == PyUnicode_2BYTE_KIND) { /* use Py_UCS2* pointers / } else if (kind1 == PyUnicode_4BYTE_KIND && kind2 == PyUnicode_4BYTE_KIND) { / use Py_UCS4* pointers */ } Inegality comparaisons are used to sort Unicode lists for example.
msg146541 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-10-28 08:54
> These 3 minor optimizations can make unicode_compare faster. Can you please try to write a short benchmark script? (or just run a benchmark using ./python -m timeit)
msg146730 - (view)	Author: Richard Saunders (RichIsMyName)	Date: 2011-10-31 18:59
Here's a test demonstrating the memcmp optimization effect: ----------------------------------------------------------- more ~/PickTest5/Python/string_test3.py a = [] b = [] c = [] d = [] for x in range(0,1000) : a.append("the quick brown fox"+str(x)) b.append("the wuick brown fox"+str(x)) c.append("the quick brown fox"+str(x)) d.append("the wuick brown fox"+str(x)) count = 0 for x in range(0,200000) : if a==c : count += 1 if a==c : count += 2 if a==d : count += 3 if b==c : count += 5 if b==d : count += 7 if c==d : count += 11 print(count) --------------------------------------------------------------------- # plain Python 3.3 no memcmp, no UCS specialization % time ./python ~/PickTest5/Python/string_test3.py 2000000 28.918u 0.014s 0:29.02 99.6% 0+0k 0+0io 0pf+0w % setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake) % time ./python ~/PickTest5/Python/string_test3.py 2000000 29.783u 0.017s 0:29.88 99.6% 0+0k 0+0io 0pf+0w ------------------------------------------------------------------------- # Python 3.3 with memcmp optimizations and UCS2/4 specialization (no CFLAGS) % time ./python ~/PickTest5/Python/string_test3.py 2000000 37.126u 0.046s 0:37.35 99.4% 0+0k 0+0io 0pf+0w % setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake) % time ./python ~/PickTest5/Python/string_test3.py 2000000 18.621u 0.013s 0:18.69 99.6% 0+0k 0+0io 0pf+0w --------------------------------------------------------------------- Note we only really see the effect if we make sure that gcc isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp is SO important on gcc builds! See http://www.picklingtools.com/study.pdf I am currently working with Bob Arendt (a colleague who works frequently with Fedora) to try to put the -fno-builtin-memcmp in the .spec file for their Python
msg146732 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-31 19:03
> I am currently working with Bob Arendt (a colleague who works > frequently with Fedora) to try to put the > -fno-builtin-memcmp in the .spec file for their Python Wouldn't it be better to have it enabled by default in their gcc?
msg146733 - (view)	Author: Richard Saunders (RichIsMyName)	Date: 2011-10-31 19:03
Added branches for specializing for UCS2 and UCS4 types
msg146742 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2011-10-31 20:49
> Note we only really see the effect if we make sure that gcc > isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp > is SO important on gcc builds! I'd rather infer the opposite: given how GCC generates code, this patch is not worthwhile. Given that it actually slows down Python in the default system configuration, I'm -1 on applying it. This is really not a route we should take; it leads to maintenance pain.
msg146744 - (view)	Author: Raymond Hettinger (rhettinger) *	Date: 2011-10-31 20:56
I concur with Martin.
msg146747 - (view)	Author: Richard Saunders (RichIsMyName)	Date: 2011-10-31 22:30
Some more information: Bob Arendt and I have been playing with the Fedora Core .spec file for python on Fedora Core 15: the compile options we found seem to automatically (as we did non invoke this option) invoke '-fno-builtin-memcmp' somehow? We disassembled the Python binary we built for the machine ourselves (via the spec file) code and saw, yes, it was calling memcmp on the system, even though we didn't bypass it explicitly. Our conjecture is that the -m32 or -mtune=atom automatically turns the builtin memcmp off, but we're not sure (we're still playing with it). However, perhaps Fedora builds on a more generic machine: that generic build keeps the 'rep cmbsb'? Frustrating.
msg146755 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-10-31 23:47
> > Note we only really see the effect if we make sure that gcc > > isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp > > is SO important on gcc builds! > > I'd rather infer the opposite: given how GCC generates code, this patch > is not worthwhile. Given that it actually slows down Python in the > default system configuration, I'm -1 on applying it. This is really not > a route we should take; it leads to maintenance pain. I agree with Martin. This patch would be very nice if there wasn't the memcmp() perf problem. A possible solution would be to write a simple optimized memcmp()-alike for our own purposes. But I'm not sure it's really worth the hassle: comparing long unicode strings doesn't strike me as a very common operation. Finding a short substring, conatenating strings together, are all much more common.
msg146759 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2011-11-01 06:56
Ok, closing the issue.

History
Date	User	Action	Args
2022-04-11 14:57:23	admin	set	github: 57488
2011-11-01 06:56:36	loewis	set	messages: + msg146759
2011-11-01 06:56:14	loewis	set	status: open -> closed resolution: wont fix
2011-10-31 23:47:43	pitrou	set	messages: + msg146755 title: Add memcmp into unicode_compare for optimizing comparisons -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-31 22:30:03	RichIsMyName	set	messages: + msg146747
2011-10-31 20:56:51	rhettinger	set	nosy: + rhettinger messages: + msg146744
2011-10-31 20:49:33	loewis	set	messages: + msg146742 title: Add memcmp into unicode_compare for optimizing comparisons -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-31 19:03:47	RichIsMyName	set	files: + unicode_with_memcmp_and_ucs_specialization.patch messages: + msg146733
2011-10-31 19:03:37	pitrou	set	messages: + msg146732
2011-10-31 19:00:00	RichIsMyName	set	messages: + msg146730
2011-10-28 08:54:09	vstinner	set	messages: + msg146541
2011-10-28 08:52:45	vstinner	set	messages: + msg146540
2011-10-28 05:31:02	petri.lehtinen	set	nosy: + petri.lehtinen
2011-10-27 22:14:28	pitrou	set	nosy: + vstinner, ezio.melotti stage: patch review versions: - Python 3.4
2011-10-27 17:58:54	RichIsMyName	set	files: + unicode_with_memcmp.patch keywords: + patch messages: + msg146511 title: Add memcmp into unicode_compare for optimizing compares -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-27 17:52:41	RichIsMyName	create