This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients Neil.Hodgson, ethan.furman, ezio.melotti, georg.brandl, pitrou, serhiy.storchaka, vstinner
Date 2013-04-04.07:44:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <CAMpsgwbTF6WC0j94=zViWro5s9ezLqdF8Rfis3-PWNUmTefmaw@mail.gmail.com>
In-reply-to <1365029696.53.0.503821886831.issue17615@psf.upfronthosting.co.za>
Content
"For 32-bit Windows, the code generated for unicode_compare is quite
slow. There are either 1 or 2 kind checks in each call to
PyUnicode_READ (...)"

Yes, PyUnicode_READ() *is* slow. It should not be used in a loop. And
unicode_compare() uses PyUnicode_READ() in a loop.

An improvment would be to write specialized version of each
combinaison of Unicode kinds:
(UCS1, UCS2), (UCS1, UCS4),
(UCS2, UCS1), (UCS2, UCS2), (UCS2, UCS4)
(UCS4, UCS1), (UCS4, UCS2), (UCS4, UCS4)
# (UCS1, UCS1) uses memcmp()

But I am not convinced that the gain would be visible, and I don't
know how to factorize the code. We should probably use a huge macro.

2013/4/4 Neil Hodgson <report@bugs.python.org>:
>
> Neil Hodgson added the comment:
>
> For 32-bit Windows, the code generated for unicode_compare is quite slow.
>
>     There are either 1 or 2 kind checks in each call to PyUnicode_READ and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to move the kind checks out of the loop and specialize the loop but MSVC 2010 appears to not do so. The assembler (32-bit build) for each PyUnicode_READ looks like
>
>     mov    ecx, DWORD PTR _kind1$[ebp]
>     cmp    ecx, 1
>     jne    SHORT $LN17@unicode_co@2
>     lea    ecx, DWORD PTR [ebx+eax]
>     movzx    edx, BYTE PTR [ecx+edx]
>     jmp    SHORT $LN16@unicode_co@2
> $LN17@unicode_co@2:
>     cmp    ecx, 2
>     jne    SHORT $LN15@unicode_co@2
>     movzx    edx, WORD PTR [ebx+edi]
>     jmp    SHORT $LN16@unicode_co@2
> $LN15@unicode_co@2:
>     mov    edx, DWORD PTR [ebx+esi]
> $LN16@unicode_co@2:
>
>    The kind1/kind2 variables aren't even going into registers and at least one test+branch and a jump are executed for every character. Two tests for 2 and 4 byte kinds. len1 and len2 don't get to go into registers either.
>
>    My system isn't set up for 64-bit MSVC 2010 but looking at the code from 64-bit MSVC 2012 shows that all the variables have been moved into registers but the kind checking is still inside the loop. This accounts for better results with 64-bit Python 3.3 on Windows but isn't as good as Unix or Python 3.2.
>
> ; 10431:         c1 = PyUnicode_READ(kind1, data1, i);
>
>         cmp     rsi, 1
>         jne     SHORT $LN17@unicode_co
>         lea     rax, QWORD PTR [r9+rcx]
>         movzx   r8d, BYTE PTR [rax+rbx]
>         jmp     SHORT $LN16@unicode_co
> $LN17@unicode_co:
>         cmp     rsi, 2
>         jne     SHORT $LN15@unicode_co
>         movzx   r8d, WORD PTR [r9+r11]
>         jmp     SHORT $LN16@unicode_co
> $LN15@unicode_co:
>         mov     r8d, DWORD PTR [r9+r10]
> $LN16@unicode_co:
>
>    Attached the 32-bit assembler listing.
>
> ----------
> Added file: http://bugs.python.org/file29673/unicode_compare.asm
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue17615>
> _______________________________________
History
Date User Action Args
2013-04-04 07:44:46vstinnersetrecipients: + vstinner, georg.brandl, pitrou, ezio.melotti, ethan.furman, serhiy.storchaka, Neil.Hodgson
2013-04-04 07:44:46vstinnerlinkissue17615 messages
2013-04-04 07:44:46vstinnercreate