Message186010
"For 32-bit Windows, the code generated for unicode_compare is quite
slow. There are either 1 or 2 kind checks in each call to
PyUnicode_READ (...)"
Yes, PyUnicode_READ() *is* slow. It should not be used in a loop. And
unicode_compare() uses PyUnicode_READ() in a loop.
An improvment would be to write specialized version of each
combinaison of Unicode kinds:
(UCS1, UCS2), (UCS1, UCS4),
(UCS2, UCS1), (UCS2, UCS2), (UCS2, UCS4)
(UCS4, UCS1), (UCS4, UCS2), (UCS4, UCS4)
# (UCS1, UCS1) uses memcmp()
But I am not convinced that the gain would be visible, and I don't
know how to factorize the code. We should probably use a huge macro.
2013/4/4 Neil Hodgson <report@bugs.python.org>:
>
> Neil Hodgson added the comment:
>
> For 32-bit Windows, the code generated for unicode_compare is quite slow.
>
> There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop and specialize the loop but MSVC 2010 appears to not do so. The assembler (32-bit build) for each PyUnicode_READ looks like
>
> mov ecx, DWORD PTR _kind1$[ebp]
> cmp ecx, 1
> jne SHORT $LN17@unicode_co@2
> lea ecx, DWORD PTR [ebx+eax]
> movzx edx, BYTE PTR [ecx+edx]
> jmp SHORT $LN16@unicode_co@2
> $LN17@unicode_co@2:
> cmp ecx, 2
> jne SHORT $LN15@unicode_co@2
> movzx edx, WORD PTR [ebx+edi]
> jmp SHORT $LN16@unicode_co@2
> $LN15@unicode_co@2:
> mov edx, DWORD PTR [ebx+esi]
> $LN16@unicode_co@2:
>
> The kind1/kind2 variables aren't even going into registers and at least one test+branch and a jump are executed for every character. Two tests for 2 and 4 byte kinds. len1 and len2 don't get to go into registers either.
>
> My system isn't set up for 64-bit MSVC 2010 but looking at the code from 64-bit MSVC 2012 shows that all the variables have been moved into registers but the kind checking is still inside the loop. This accounts for better results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python 3.2.
>
> ; 10431: c1 = PyUnicode_READ(kind1, data1, i);
>
> cmp rsi, 1
> jne SHORT $LN17@unicode_co
> lea rax, QWORD PTR [r9+rcx]
> movzx r8d, BYTE PTR [rax+rbx]
> jmp SHORT $LN16@unicode_co
> $LN17@unicode_co:
> cmp rsi, 2
> jne SHORT $LN15@unicode_co
> movzx r8d, WORD PTR [r9+r11]
> jmp SHORT $LN16@unicode_co
> $LN15@unicode_co:
> mov r8d, DWORD PTR [r9+r10]
> $LN16@unicode_co:
>
> Attached the 32-bit assembler listing.
>
> ----------
> Added file: http://bugs.python.org/file29673/unicode_compare.asm
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17615>
> _______________________________________ |
|
Date |
User |
Action |
Args |
2013-04-04 07:44:46 | vstinner | set | recipients:
+ vstinner, georg.brandl, pitrou, ezio.melotti, ethan.furman, serhiy.storchaka, Neil.Hodgson |
2013-04-04 07:44:46 | vstinner | link | issue17615 messages |
2013-04-04 07:44:46 | vstinner | create | |
|