Title: Use memcpy() instead of for() loops in _PyUnicode_To*
Type: performance Stage: resolved
Components: Unicode Versions:
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, petdance, rhettinger, vstinner
Priority: normal Keywords:

Created on 2020-02-08 21:45 by petdance, last changed 2020-02-09 01:52 by petdance. This issue is now closed.

Messages (3)
msg361636 - (view) Author: Andy Lester (petdance) * Date: 2020-02-08 21:45
Four functions in Objects/unicodectype.c copy values out of lookup tables with a for loop

        int i;
        for (i = 0; i < n; i++)
            res[i] = _PyUnicode_ExtendedCase[index + i];

instead of a memcpy

        memcpy(res, &_PyUnicode_ExtendedCase[index], n * sizeof(Py_UCS4));

My Apple clang version 11.0.0 on my Mac optimizes away the for loop and generates equivalent code to the memcpy.

gcc 4.8.5 on my Linux box (the newest GCC I have) does not optimize away the loop.

The four functions are:
msg361638 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-02-08 21:54
In the past, we've also gotten some gains by replacing memcpy() with for-loops.  These kinds of optimization choices are best left to the compiler.
msg361640 - (view) Author: Andy Lester (petdance) * Date: 2020-02-09 01:52
Thanks for replying. I figured that might be the case, which is why I made a ticket before bothering with a pull request.

I've also seen this kind of thing around:

                i = ctx->pattern[0];
                Py_ssize_t groupref = i+i;

instead of

                Py_ssize_t groupref = ctx->pattern[0]*2;

Is that also the kind of thing we would leave for the compiler to sort out?
Date User Action Args
2020-02-09 01:52:36petdancesetmessages: + msg361640
2020-02-08 21:54:27rhettingersetstatus: open -> closed

nosy: + rhettinger
messages: + msg361638

resolution: rejected
stage: resolved
2020-02-08 21:45:29petdancecreate