classification
Title: Add memcmp into unicode_compare for optimizing comparisons
Type: performance Stage: patch review
Components: Interpreter Core Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: RichIsMyName, asmodai, ezio.melotti, loewis, petri.lehtinen, pitrou, rhettinger, scoder, vstinner
Priority: normal Keywords: patch

Created on 2011-10-27 17:52 by RichIsMyName, last changed 2011-11-01 06:56 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_with_memcmp.patch RichIsMyName, 2011-10-27 17:58 Patch for unicode_compare changes, plus added tests in test_unicode.py review
unicode_with_memcmp_and_ucs_specialization.patch RichIsMyName, 2011-10-31 19:03 review
Messages (12)
msg146508 - (view) Author: Richard Saunders (RichIsMyName) Date: 2011-10-27 17:52
In discussions of memcmp performance, (http://www.picklingtools.com/study.pdf)
it was noted how well Python 2.7 can take advantage of faster memcmps (indeed, the rich comparisons are all memcmp calls).
There have been some discussion on python-dev@python.org as well.

With unicode and Python 3.3 (and anyPython 3.x) there are a 
few places we could call memcmp to make string comparisons faster, but they are not completely trivial.

Basically, if the unicode strings are "1 byte kind", then memcmp can be used almost as is.  If the unicode strings are the same kind, they can at least use memcmp to compare for equality or inequality.

There is also a minor optimization laying in unicode_compare: if you
are comparing two strings for equality/inequality, there is no reason to look at the entire string if the lengths are different.

These 3 minor optimizations can make unicode_compare faster.
msg146511 - (view) Author: Richard Saunders (RichIsMyName) Date: 2011-10-27 17:58
This is a potential patch: 
I believe it follows the C-style of PEP 7

There is a test as well, testing 1 and 2 byte kinds.

I have run it through the python tests and have added no new breakages
(there were some tests that failed, but they failed with and without the patch)
msg146540 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-28 08:52
I would be nice to have a third path for inegality with kind1==kind2, something like:

else if (kind1 == PyUnicode_2BYTE_KIND && kind2 == PyUnicode_2BYTE_KIND)
{
  /* use Py_UCS2* pointers */
}
else if (kind1 == PyUnicode_4BYTE_KIND && kind2 == PyUnicode_4BYTE_KIND)
{
  /* use Py_UCS4* pointers */
}

Inegality comparaisons are used to sort Unicode lists for example.
msg146541 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-10-28 08:54
> These 3 minor optimizations can make unicode_compare faster.

Can you please try to write a short benchmark script? (or just run a benchmark using ./python -m timeit)
msg146730 - (view) Author: Richard Saunders (RichIsMyName) Date: 2011-10-31 18:59
Here's a test demonstrating the memcmp optimization effect:
-----------------------------------------------------------
more ~/PickTest5/Python/string_test3.py

a = []
b = []
c = []
d = []
for x in range(0,1000) :
    a.append("the quick brown fox"+str(x))
    b.append("the wuick brown fox"+str(x))
    c.append("the quick brown fox"+str(x))
    d.append("the wuick brown fox"+str(x))

count = 0
for x in range(0,200000) :
    if a==c : count += 1
    if a==c : count += 2
    if a==d : count += 3
    if b==c : count += 5
    if b==d : count += 7
    if c==d : count += 11

print(count)



---------------------------------------------------------------------

# plain Python 3.3  no memcmp, no UCS specialization 

% time ./python ~/PickTest5/Python/string_test3.py 
2000000
28.918u 0.014s 0:29.02 99.6%	0+0k 0+0io 0pf+0w


% setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)


% time ./python ~/PickTest5/Python/string_test3.py
2000000
29.783u 0.017s 0:29.88 99.6%	0+0k 0+0io 0pf+0w


-------------------------------------------------------------------------
# Python 3.3 with memcmp optimizations and UCS2/4 specialization (no CFLAGS)

% time ./python ~/PickTest5/Python/string_test3.py
2000000
37.126u 0.046s 0:37.35 99.4%	0+0k 0+0io 0pf+0w


% setenv CFLAGS -fno-builtin-memcmp # (reconfigure and remake)

% time ./python ~/PickTest5/Python/string_test3.py
2000000
18.621u 0.013s 0:18.69 99.6%	0+0k 0+0io 0pf+0w



---------------------------------------------------------------------

Note we only really see the effect if we make sure that gcc
isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
is SO important on gcc builds!
See http://www.picklingtools.com/study.pdf

I am currently working with Bob Arendt (a colleague who works
frequently with Fedora) to try to put the
-fno-builtin-memcmp in the .spec file for their Python
msg146732 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-31 19:03
> I am currently working with Bob Arendt (a colleague who works
> frequently with Fedora) to try to put the
> -fno-builtin-memcmp in the .spec file for their Python

Wouldn't it be better to have it enabled by default in their gcc?
msg146733 - (view) Author: Richard Saunders (RichIsMyName) Date: 2011-10-31 19:03
Added branches for specializing for UCS2 and UCS4 types
msg146742 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-31 20:49
> Note we only really see the effect if we make sure that gcc
> isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
> is SO important on gcc builds!

I'd rather infer the opposite: given how GCC generates code, this patch
is not worthwhile. Given that it actually slows down Python in the
default system configuration, I'm -1 on applying it. This is really not
a route we should take; it leads to maintenance pain.
msg146744 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-10-31 20:56
I concur with Martin.
msg146747 - (view) Author: Richard Saunders (RichIsMyName) Date: 2011-10-31 22:30
Some more information:

Bob Arendt and I have been playing with the Fedora Core .spec file
for python on Fedora Core 15: 
the compile options we found seem to automatically (as we did non invoke
this option) invoke '-fno-builtin-memcmp' somehow?  We disassembled the
Python binary we built for the machine ourselves (via the spec file)
code and saw, yes, it was calling memcmp on the system, even though we
didn't bypass it explicitly.

Our conjecture is that the -m32 or -mtune=atom automatically turns the builtin memcmp off, but we're not sure (we're still playing with it). 

However, perhaps Fedora builds on a more generic machine: that generic build keeps the 'rep cmbsb'?

Frustrating.
msg146755 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-10-31 23:47
> > Note we only really see the effect if we make sure that gcc
> > isn't emitting its "special" memcmp: that's why the -fno-builtin-memcmp
> > is SO important on gcc builds!
> 
> I'd rather infer the opposite: given how GCC generates code, this patch
> is not worthwhile. Given that it actually slows down Python in the
> default system configuration, I'm -1 on applying it. This is really not
> a route we should take; it leads to maintenance pain.

I agree with Martin. This patch would be very nice if there wasn't the
memcmp() perf problem. A possible solution would be to write a simple
optimized memcmp()-alike for our own purposes.
But I'm not sure it's really worth the hassle: comparing long unicode
strings doesn't strike me as a very common operation. Finding a short
substring, conatenating strings together, are all much more common.
msg146759 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-11-01 06:56
Ok, closing the issue.
History
Date User Action Args
2011-11-01 06:56:36loewissetmessages: + msg146759
2011-11-01 06:56:14loewissetstatus: open -> closed
resolution: wont fix
2011-10-31 23:47:43pitrousetmessages: + msg146755
title: Add memcmp into unicode_compare for optimizing comparisons -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-31 22:30:03RichIsMyNamesetmessages: + msg146747
2011-10-31 20:56:51rhettingersetnosy: + rhettinger
messages: + msg146744
2011-10-31 20:49:33loewissetmessages: + msg146742
title: Add memcmp into unicode_compare for optimizing comparisons -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-31 19:03:47RichIsMyNamesetfiles: + unicode_with_memcmp_and_ucs_specialization.patch

messages: + msg146733
2011-10-31 19:03:37pitrousetmessages: + msg146732
2011-10-31 19:00:00RichIsMyNamesetmessages: + msg146730
2011-10-28 08:54:09vstinnersetmessages: + msg146541
2011-10-28 08:52:45vstinnersetmessages: + msg146540
2011-10-28 05:31:02petri.lehtinensetnosy: + petri.lehtinen
2011-10-27 22:14:28pitrousetnosy: + vstinner, ezio.melotti
stage: patch review

versions: - Python 3.4
2011-10-27 17:58:54RichIsMyNamesetfiles: + unicode_with_memcmp.patch
keywords: + patch
messages: + msg146511

title: Add memcmp into unicode_compare for optimizing compares -> Add memcmp into unicode_compare for optimizing comparisons
2011-10-27 17:52:41RichIsMyNamecreate