This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: String comparison performance regression
Type: performance Stage: patch review
Components: Interpreter Core, Unicode Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Neil.Hodgson, ethan.furman, ezio.melotti, georg.brandl, loewis, pitrou, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2013-04-02 09:14 by Neil.Hodgson, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
charwidth.py Neil.Hodgson, 2013-04-02 09:14 Program for measuring time taken for comparisons
unicode_compare.asm Neil.Hodgson, 2013-04-03 22:54
specialize_compare.patch vstinner, 2013-04-07 22:24
Messages (27)
msg185824 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-02 09:14
On Windows, non-equal comparisons (<, <=, >, >=) between strings with common prefixes are slower in Python 3.3 than 3.2. This is for both 32-bit and 64-bit builds. Performance on Linux has not decreased for the same code. The attached program tests comparisons for strings that have common prefixes.

On a 64-bit build, a 25 character string comparison is around 30% slower and a 100 character string averages 85% slower. A user of 32-bit Python builds reported the 25 character case to average 70% slower. 

Here are two runs of the program using 3.2/3.3 on Windows 7 on an i7 870:

>c:\python32\python -u "charwidth.py"
3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176
[0.7116295577956576, 0.7055591343157613, 0.7203483026429418]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176
[0.7664397841378787, 0.7199902325464409, 0.713719289812504]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176
[0.7341851791817691, 0.6994205901833599, 0.7106807593741005]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']180
[0.7346812372666784, 0.6995411113377914, 0.7064768417728411]

>c:\python33\python -u "charwidth.py"
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108
[0.9913326076446045, 0.9455845241056282, 0.9459076605341776]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192
[1.0472289217234318, 1.0362342484091207, 1.0197109728048384]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192
[1.0439643704533834, 0.9878581050301687, 0.9949265834034335]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312
[1.0987483965446412, 1.0130257167690004, 1.024832248526499]
msg185831 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-02 12:36
Why do you care? Does it impact a real-world workload?
msg185860 - (view) Author: Ethan Furman (ethan.furman) * (Python committer) Date: 2013-04-02 21:36
As Ian Kelly said on Python-List:

<quote>
Micro-benchmarks like the ones [jmf] have been reporting are *useful*
when it comes to determining what operations can be better optimized,
but they are not *important* in and of themselves.  What is important
is that actual, real-world programs are not significantly slowed by
these kinds of optimizations.  Until [it] can demonstrated that real
programs are adversely affected by PEP 393, there is not in my opinion
any regression that is worth worrying over.
</quote>

I think this issue should be closed.
msg185862 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-02 22:05
Compare (Unicode) strings was optimized after the release of Python 3.3.

changeset:   79469:54154be6b27d
user:        Victor Stinner <victor.stinner@gmail.com>
date:        Thu Oct 04 22:59:45 2012 +0200
files:       Objects/unicodeobject.c
description:
Optimize unicode_compare(): use memcmp() when comparing two UCS1 strings

changeset:   79902:b68be1025c42
user:        Victor Stinner <victor.stinner@gmail.com>
date:        Tue Oct 23 02:48:49 2012 +0200
files:       Objects/unicodeobject.c
description:
Optimize PyUnicode_RichCompare() for Py_EQ and Py_NE: always use memcmp()

---

It looks like Python 3.4 is faster than 3.2 for this specific micro-benchmark on my computer. So I'm closing the issue.

If you see an interesting optimization, please write a patch and open an issue. But complaining that PEP 393 slowed down Unicode does not help at all. PEP 393 solved a lot of other issues!


3.2.3+ (3.2:d40afd489b6a, Apr  2 2013, 23:46:20) 
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']176
[0.38440799713134766, 0.38411498069763184, 0.38804006576538086]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']176
[0.3850290775299072, 0.38683581352233887, 0.3845059871673584]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']176
[0.38274407386779785, 0.3805210590362549, 0.38046717643737793]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']180
[0.3880500793457031, 0.38711094856262207, 0.3869481086730957]



3.3.0+ (3.3:c78dfc6ce37a, Apr  2 2013, 23:48:14) 
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108
[0.4134676969842985, 0.4146421169862151, 0.41625474498141557]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192
[0.42760137701407075, 0.42286567797418684, 0.42544596805237234]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192
[0.4288683719933033, 0.4251258020522073, 0.4281281529692933]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312
[0.40928812394849956, 0.4099267750279978, 0.4107871470041573]



3.4.0a0 (default:9328e2b8a397, Apr  2 2013, 23:46:24) 
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']108
[0.31218199292197824, 0.30999370804056525, 0.31113169400487095]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']192
[0.3712720649782568, 0.37407689797692, 0.3728883999865502]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']192
[0.36971510702278465, 0.3688076320104301, 0.36580446804873645]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']312
[0.3653324950719252, 0.3652214870089665, 0.36527683096937835]
msg185887 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-03 04:27
The common cases are likely to be 1:1, 2:2, and 1:2. There is already a specialisation for 1:1. wmemcmp is widely available but is based on wchar_t so is for different widths on Windows and Unix. On Windows it would handle the 2:2 case.
msg185933 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-04-03 16:41
Reopening for consideration of using wmemcmp().
msg185944 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-03 19:19
"wmemcmp is widely available but is based on wchar_t so is for different widths on Windows and Unix. On Windows it would handle the 2:2 case."

I don't know if wmemcmp() can be used if wchar_t type is signed. Is there an OS with signed wchar_t? If yes, we need to check if wchar_t is signed in configure.

On Windows, wchar_t size is 16 bits, whereas it is 32 bits on Mac OS X and Linux (and most OSes).
msg185960 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-03 21:45
For 32-bits whether wchar_t is signed shouldn't matter as Unicode is only 21-bits so no character will be seen as negative. On Windows, wchar_t is unsigned.

C11 has char16_t and char32_t which are both unsigned but it doesn't include comparison functions.
msg185974 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-03 22:54
For 32-bit Windows, the code generated for unicode_compare is quite slow.

    There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop and specialize the loop but MSVC 2010 appears to not do so. The assembler (32-bit build) for each PyUnicode_READ looks like

    mov    ecx, DWORD PTR _kind1$[ebp]
    cmp    ecx, 1
    jne    SHORT $LN17@unicode_co@2
    lea    ecx, DWORD PTR [ebx+eax]
    movzx    edx, BYTE PTR [ecx+edx]
    jmp    SHORT $LN16@unicode_co@2
$LN17@unicode_co@2:
    cmp    ecx, 2
    jne    SHORT $LN15@unicode_co@2
    movzx    edx, WORD PTR [ebx+edi]
    jmp    SHORT $LN16@unicode_co@2
$LN15@unicode_co@2:
    mov    edx, DWORD PTR [ebx+esi]
$LN16@unicode_co@2:

   The kind1/kind2 variables aren't even going into registers and at least one test+branch and a jump are executed for every character. Two tests for 2 and 4 byte kinds. len1 and len2 don't get to go into registers either.

   My system isn't set up for 64-bit MSVC 2010 but looking at the code from 64-bit MSVC 2012 shows that all the variables have been moved into registers but the kind checking is still inside the loop. This accounts for better results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python 3.2.

; 10431:         c1 = PyUnicode_READ(kind1, data1, i);

	cmp	rsi, 1
	jne	SHORT $LN17@unicode_co
	lea	rax, QWORD PTR [r9+rcx]
	movzx	r8d, BYTE PTR [rax+rbx]
	jmp	SHORT $LN16@unicode_co
$LN17@unicode_co:
	cmp	rsi, 2
	jne	SHORT $LN15@unicode_co
	movzx	r8d, WORD PTR [r9+r11]
	jmp	SHORT $LN16@unicode_co
$LN15@unicode_co:
	mov	r8d, DWORD PTR [r9+r10]
$LN16@unicode_co:

   Attached the 32-bit assembler listing.
msg186010 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-04 07:44
"For 32-bit Windows, the code generated for unicode_compare is quite
slow. There are either 1 or 2 kind checks in each call to
PyUnicode_READ (...)"

Yes, PyUnicode_READ() *is* slow. It should not be used in a loop. And
unicode_compare() uses PyUnicode_READ() in a loop.

An improvment would be to write specialized version of each
combinaison of Unicode kinds:
(UCS1, UCS2), (UCS1, UCS4),
(UCS2, UCS1), (UCS2, UCS2), (UCS2, UCS4)
(UCS4, UCS1), (UCS4, UCS2), (UCS4, UCS4)
# (UCS1, UCS1) uses memcmp()

But I am not convinced that the gain would be visible, and I don't
know how to factorize the code. We should probably use a huge macro.

2013/4/4 Neil Hodgson <report@bugs.python.org>:
>
> Neil Hodgson added the comment:
>
> For 32-bit Windows, the code generated for unicode_compare is quite slow.
>
>     There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop and specialize the loop but MSVC 2010 appears to not do so. The assembler (32-bit build) for each PyUnicode_READ looks like
>
>     mov    ecx, DWORD PTR _kind1$[ebp]
>     cmp    ecx, 1
>     jne    SHORT $LN17@unicode_co@2
>     lea    ecx, DWORD PTR [ebx+eax]
>     movzx    edx, BYTE PTR [ecx+edx]
>     jmp    SHORT $LN16@unicode_co@2
> $LN17@unicode_co@2:
>     cmp    ecx, 2
>     jne    SHORT $LN15@unicode_co@2
>     movzx    edx, WORD PTR [ebx+edi]
>     jmp    SHORT $LN16@unicode_co@2
> $LN15@unicode_co@2:
>     mov    edx, DWORD PTR [ebx+esi]
> $LN16@unicode_co@2:
>
>    The kind1/kind2 variables aren't even going into registers and at least one test+branch and a jump are executed for every character. Two tests for 2 and 4 byte kinds. len1 and len2 don't get to go into registers either.
>
>    My system isn't set up for 64-bit MSVC 2010 but looking at the code from 64-bit MSVC 2012 shows that all the variables have been moved into registers but the kind checking is still inside the loop. This accounts for better results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python 3.2.
>
> ; 10431:         c1 = PyUnicode_READ(kind1, data1, i);
>
>         cmp     rsi, 1
>         jne     SHORT $LN17@unicode_co
>         lea     rax, QWORD PTR [r9+rcx]
>         movzx   r8d, BYTE PTR [rax+rbx]
>         jmp     SHORT $LN16@unicode_co
> $LN17@unicode_co:
>         cmp     rsi, 2
>         jne     SHORT $LN15@unicode_co
>         movzx   r8d, WORD PTR [r9+r11]
>         jmp     SHORT $LN16@unicode_co
> $LN15@unicode_co:
>         mov     r8d, DWORD PTR [r9+r10]
> $LN16@unicode_co:
>
>    Attached the 32-bit assembler listing.
>
> ----------
> Added file: http://bugs.python.org/file29673/unicode_compare.asm
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17615>
> _______________________________________
msg186054 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-04 22:21
Looking at the assembler output from gcc 4.7 on Linux shows that it specialises the loop 9 times - once for each pair of kinds. This is why there was far less slow-down on Linux.

Explicitly writing out the 9 loops is inelegant and would make accurate maintenance more difficult. There may be some way to use the preprocessor to do this cleanly.
msg186217 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-07 15:33
On big-endian platform we can use memcmp for 2:2 and 4:4 comparison. I do not sure it will be faster. ;)
msg186252 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-07 22:24
Here is a patch specializing unicode_compare() for each combinaison of (kind1, kind2), to avoid the expensive PyUnicode_READ() macro (2 if). On Linux using GCC -O3 (GCC 4.7), there is no difference since GCC already specialize the loops. It may help other compilers.
msg186274 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-08 04:45
The patch fixes the performance regression on Windows. The 1:1 case is better than either 3.2.4 or 3.3.1 downloads from python.org. Other cases are close to 3.2.4, losing at most around 2%. Measurements from 32-bit builds:

## Download 3.2.4
3.2.4 (default, Apr  6 2013, 20:07:44) [MSC v.1500 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']148
[0.9251519691803254, 0.9228673224604178, 0.9270485054253375]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']148
[0.9088621585998959, 0.916762355170341, 0.9102371386441703]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']148
[0.9071172334674893, 0.9079409638903551, 0.9188950414432817]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']152
[0.9154984634528134, 0.9211241439998155, 0.9235272150680487]

## Download 3.3.1
3.3.1 (v3.3.1:d9893d13c628, Apr  6 2013, 20:25:12) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[1.107935584141198, 1.080932736716823, 1.079060304542709]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[1.2201494661996297, 1.2355558101814896, 1.217881936863404]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[1.1195841384034795, 1.1172607155695182, 1.1198056163882537]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[1.2389038306958007, 1.2207520679720822, 1.2370782093260395]

## Local build of 3.3.0 before patch
3.3.0 (default, Apr  8 2013, 14:06:26) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[1.0824058797164942, 1.0680695468818941, 1.0685949457606005]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[1.2159871472901957, 1.2169558514728118, 1.209515728255596]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[1.1111012521191492, 1.1091369450081352, 1.1049337539784823]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[1.2080548119585544, 1.2094420187054578, 1.2138603997013906]

## Local build of 3.3.0 after patch
3.3.0 (default, Apr  8 2013, 14:23:45) [MSC v.1600 32 bit (Intel)]
a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/z']84
[0.8673423724763649, 0.8545937643117921, 0.8289229288053079]

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
[0.9235338524209049, 0.9305998385376584, 0.9229137839304098]

a=['C:/Users/Neil/Documents/b','C:/Users/Neil/Documents/η']156
[0.891971842253179, 0.8971224280694345, 0.9036679059885344]

a=['C:/Users/Neil/Documents/𠀀','C:/Users/Neil/Documents/𠀁']276
[0.9310441918446486, 0.9431070566588904, 0.9355432690779342]
msg186289 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-04-08 10:24
You can use a single switch instead nested switches:

switch ((kind1 << 3) + kind2) {
case (PyUnicode_1BYTE_KIND << 3) + PyUnicode_1BYTE_KIND: {
    int cmp = memcmp(data1, data2, len);
    ...
}
case (PyUnicode_1BYTE_KIND << 3) + PyUnicode_2BYTE_KIND:
    COMPARE(Py_UCS1, Py_UCS2);
    break;
...
}

I don't know if there is any effect.
msg186293 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-08 12:35
A quick rewrite showed the single level case slightly faster (1%) on average but its less readable/maintainable. Perhaps taking a systematic approach to naming would allow Py_UCS1 to be deduced from PyUnicode_1BYTE_KIND and so avoid repeating the information in the case selector and macro invocation.
msg186297 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-04-08 12:55
> You can use a single switch instead nested switches:
> 
> switch ((kind1 << 3) + kind2) {
> case (PyUnicode_1BYTE_KIND << 3) + PyUnicode_1BYTE_KIND: {
>     int cmp = memcmp(data1, data2, len);
>     ...
> }

Please let's not add this kind of optifuscation unless it has a large positive effect.
msg186342 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-08 19:53
New changeset cc74062c28a6 by Victor Stinner in branch 'default':
Issue #17615: Expand expensive PyUnicode_READ() macro in unicode_compare():
http://hg.python.org/cpython/rev/cc74062c28a6
msg186345 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-08 20:35
New changeset db4a1a3d1f90 by Victor Stinner in branch 'default':
Issue #17615: Add tests comparing Unicode strings of different kinds
http://hg.python.org/cpython/rev/db4a1a3d1f90
msg186348 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-08 21:06
New changeset d3185be3e8d7 by Victor Stinner in branch 'default':
Issue #17615: Comparing two Unicode strings now uses wmemcmp() when possible
http://hg.python.org/cpython/rev/d3185be3e8d7
msg186349 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-08 21:16
Neil.Hodgson wrote:
"The patch fixes the performance regression on Windows. The 1:1 case is better than either 3.2.4 or 3.3.1 downloads from python.org. Other cases are close to 3.2.4, losing at most around 2%."

Nice, but make sure that your are using the same compiler with the same options (ex: make sure that you are compiling in Release mode).

Neil.Hodgson wrote:
"Perhaps taking a systematic approach to naming would allow Py_UCS1 to be deduced from PyUnicode_1BYTE_KIND and so avoid repeating the information in the case selector and macro invocation."

I don't know how to do that in C. Anyway, I prefer to have a more explicit call to a "simple" macro than magic implicit arguments. Optimizations sometimes make the code harder to read (a good example: the whole PEP 393)...

--

I wrote specialized functions to compare strings for each combination of Unicode kinds, and I added a fast path using wmemcmp() when possible. I don't see other speedup.

On Linux, Comparing astral strings in Python 3.4 is now 3 times faster than Python 3.2 and 3.3. I achieved my goal, I can close the issue :-D
msg186359 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-08 23:32
Including the wmemcmp patch did not improve the times on MSC v.1600 32 bit - if anything, the performance was a little slower for the test I used:

a=['C:/Users/Neil/Documents/λ','C:/Users/Neil/Documents/η']156
specialised:
[0.9125948707773204, 0.8990815272107868, 0.9055365478250721]
wmemcmp:
[0.9287715478844594, 0.926606017373151, 0.9155132192031097]

Looking at the assembler, there is a real call to wmemcmp which adds some time and wmemcmp does not seem to be optimized compared to a simple loop.

However, the use of memcmp for 1:1 is a big win. Replacing the memcmp with COMPARE(Py_UCS1, Py_UCS1) shows memcmp is 45% faster on 100 character strings. memcmp doesn't generate a real call: instead there is an inline unrolled (4 bytes per iteration) loop.
msg186454 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-04-09 21:53
New changeset b3168643677b by Victor Stinner in branch 'default':
Issue #17615: On Windows (VS2010), Performances of wmemcmp() to compare Unicode
http://hg.python.org/cpython/rev/b3168643677b
msg186455 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-09 21:57
"Including the wmemcmp patch did not improve the times on MSC v.1600 32 bit - if anything, the performance was a little slower for the test I used:"

I tested my patch on Windows before the commit and I saw similar performances with and without wmemcmp().

I checked again and you are true: performances are *a little bit* worse using wmemcmp().

"Looking at the assembler, there is a real call to wmemcmp which adds some time and wmemcmp does not seem to be optimized compared to a simple loop."

You should be true. I reverted the patch for 16-bit wchar_t to use a dummy loop instead. 16-bit wchar_t can only be found on Windows, isn't it?
msg186462 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-04-09 22:20
I'd like to propose a code size reduction. If kind1 < kind2, swap(kind1, kind2) and swap(data1, data2). Set a variable swapped to 1 (not swapped) or -1 (swapped); then return either swapped or -swapped when a difference is found.

With that, the actual comparison could be sure that kind2 <= kind1, so if kind1 is UCS1, the inner switch can go away. If kind1 is UCS2, kind1 could only be UCS1 or UCS2.
msg186463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-04-09 22:22
"I'd like to propose a code size reduction. If kind1 < kind2, swap(kind1, kind2) and swap(data1, data2)."

Yeah, I hesitated to implement this, but I forgot it later. Would you like to work on such change?
msg186471 - (view) Author: Neil Hodgson (Neil.Hodgson) Date: 2013-04-10 00:29
Windows is the only widely used OS that has a 16-bit wchar_t. I can't recall what OS/2 did but Python doesn't support OS/2 any more.
History
Date User Action Args
2022-04-11 14:57:43adminsetgithub: 61815
2013-04-10 00:29:39Neil.Hodgsonsetmessages: + msg186471
2013-04-09 22:22:36vstinnersetmessages: + msg186463
2013-04-09 22:20:46loewissetnosy: + loewis
messages: + msg186462
2013-04-09 21:57:09vstinnersetmessages: + msg186455
2013-04-09 21:53:40python-devsetmessages: + msg186454
2013-04-08 23:32:07Neil.Hodgsonsetmessages: + msg186359
2013-04-08 21:16:52vstinnersetstatus: open -> closed
resolution: fixed
2013-04-08 21:16:45vstinnersetmessages: + msg186349
2013-04-08 21:06:06python-devsetmessages: + msg186348
2013-04-08 20:35:02python-devsetmessages: + msg186345
2013-04-08 19:53:26python-devsetnosy: + python-dev
messages: + msg186342
2013-04-08 12:55:18pitrousetmessages: + msg186297
2013-04-08 12:35:10Neil.Hodgsonsetmessages: + msg186293
2013-04-08 10:24:48serhiy.storchakasetmessages: + msg186289
components: + Interpreter Core
stage: needs patch -> patch review
2013-04-08 04:45:56Neil.Hodgsonsetmessages: + msg186274
2013-04-07 22:24:35vstinnersetfiles: + specialize_compare.patch
resolution: fixed -> (no value)
messages: + msg186252

keywords: + patch
2013-04-07 15:33:38serhiy.storchakasetmessages: + msg186217
2013-04-04 22:21:47Neil.Hodgsonsetmessages: + msg186054
2013-04-04 07:44:46vstinnersetmessages: + msg186010
2013-04-03 22:54:56Neil.Hodgsonsetfiles: + unicode_compare.asm

messages: + msg185974
2013-04-03 21:45:56Neil.Hodgsonsetmessages: + msg185960
2013-04-03 19:19:31vstinnersetmessages: + msg185944
2013-04-03 16:41:20georg.brandlsetstatus: closed -> open
nosy: + georg.brandl
messages: + msg185933

2013-04-03 04:27:19Neil.Hodgsonsetmessages: + msg185887
2013-04-02 22:05:41vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg185862
2013-04-02 21:36:58ethan.furmansetnosy: + ethan.furman
messages: + msg185860
2013-04-02 16:42:25terry.reedysetstage: needs patch
versions: + Python 3.4, - Python 3.3
2013-04-02 12:36:08pitrousetnosy: + pitrou
messages: + msg185831
2013-04-02 09:24:07ezio.melottisetnosy: + vstinner, serhiy.storchaka
type: performance
2013-04-02 09:14:23Neil.Hodgsoncreate