classification
Title: Py_MEMCPY: Use memcpy on Windows?
Type: performance Stage: resolved
Components: Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, gitinit.py@gmail.com, paul.moore, python-dev, serhiy.storchaka, steve.dower, tim.golden, vstinner, zach.ware
Priority: normal Keywords:

Created on 2016-09-13 12:32 by christian.heimes, last changed 2018-04-17 23:39 by gitinit.py@gmail.com. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (15)
msg276262 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-13 12:32
Py_MEMCPY() has a special case for small blocks on Windows to work around an ancient performance issue in MSVC. Can we safely assume that recent MSVC properly optimize memcpy()? See #28055

/* Py_MEMCPY can be used instead of memcpy in cases where the copied blocks
 * are often very short.  While most platforms have highly optimized code for
 * large transfers, the setup costs for memcpy are often quite high.  MEMCPY
 * solves this by doing short copies "in line".
 */

#if defined(_MSC_VER)
#define Py_MEMCPY(target, source, length) do {                          \
        size_t i_, n_ = (length);                                       \
        char *t_ = (void*) (target);                                    \
        const char *s_ = (void*) (source);                              \
        if (n_ >= 16)                                                   \
            memcpy(t_, s_, n_);                                         \
        else                                                            \
            for (i_ = 0; i_ < n_; i_++)                                 \
                t_[i_] = s_[i_];                                        \
    } while (0)
#else
#define Py_MEMCPY memcpy
#endif
msg276271 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-13 13:45
Yes, memcpy (or preferably memcpy_s, which includes the size of the destination) are basically intrinsics that will inline short copies and call out to a range of implementations depending on alignment/overlap/etc.
msg276306 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-13 17:19
Perfect, thanks! Should we keep the Py_MEMCPY() macro for now and define it to memcpy()? Or should I get rid of it and just define it for b/w compatibility?
msg276309 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-13 17:25
I'm okay with removing it completely, if you're willing to change that much code. (Most of my additions already uses memcpy or memcpy_s directly, since I wasn't even aware of this macro :) )
msg276310 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-13 17:44
Behold the power of Unix! :)

$ find -name '*.[ch]' | xargs sed -i s/Py_MEMCPY/memcpy/g

Victor, are you fine with the change? I'm going to keep Py_MEMCPY around.
msg276313 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-13 18:11
The macro looks public, you cannot remove it.

Make it an alias to memcpy(), but explain that it's only kept for backward
compatibility with a reference to this issue.
msg276314 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-13 18:15
As I said, I'm going to keep Py_MEMCPY around but replace it everywhere else.

/* Py_MEMCPY is kept for backwards compatibility,
 * see https://bugs.python.org/issue28126 */
#define Py_MEMCPY memcpy
msg276317 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-13 18:22
New changeset bedce61ae0a0 by Christian Heimes in branch '3.6':
Issue #28126: Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy().
https://hg.python.org/cpython/rev/bedce61ae0a0

New changeset f5d32ed0f9c2 by Christian Heimes in branch 'default':
Issue #28126: Replace Py_MEMCPY with memcpy(). Visual Studio can properly optimize memcpy().
https://hg.python.org/cpython/rev/f5d32ed0f9c2
msg276413 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-14 10:37
Shouldn't the NEWS entity be in the C API section?
msg276426 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-14 12:50
Isn't the C API section reserved for C API changes? I haven't changed the C API but rather optimized the core.
msg276433 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-14 13:07
The Py_MEMCPY name doesn't start by an underscore, and it is not inside the "#ifndef Py_LIMITED_API" block. But it is not documented either. I don't know what is the status of this macro. If this is a part of public API, this change should be documented in the C API section. If it is inner macro, this change shouldn't be mentioned in Misc/NEWS.
msg276435 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-09-14 13:12
I don't think that it's very useful to discuss the status of the
macro. Documenting the change doesn't harm ;-)
msg276436 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2016-09-14 13:15
My change doesn't break or change 3rd party application. They might get a tiny bit faster on Windows, if they have used Py_MEMCPY() for up to 16 bytes. That's it.
msg276464 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2016-09-14 15:19
> They might get a tiny bit faster on Windows, if they have used Py_MEMCPY() for up to 16 bytes

Even that's unlikely as the loop in the macro would have been unrolled in practically every case.

This is about removing an unnecessary macro. Certainly no API change unless we delete it entirely (which I'm okay with), and IMHO if it has an issue number it should go in NEWS. Maybe build is the right section?
msg315427 - (view) Author: Andro Nuxx (gitinit.py@gmail.com) Date: 2018-04-17 23:39
Py_MEMCPY() has a special case for small blocks on Windows to work around an ancient performance issue in MSVC. Can we safely assume that recent MSVC properly optimize memcpy()? See #28055

/* Py_MEMCPY can be used instead of memcpy in cases where the copied blocks
 * are often very short.  While most platforms have highly optimized code for
 * large transfers, the setup costs for memcpy are often quite high.  MEMCPY
 * solves this by doing short copies "in line".
 */

#if defined(_MSC_VER)
#define Py_MEMCPY(target, source, length) do {                          \
        size_t i_, n_ = (length);                                       \
        char *t_ = (void*) (target);                                    \
        const char *s_ = (void*) (source);                              \
        if (n_ >= 16)                                                   \
            memcpy(t_, s_, n_);                                         \
        else                                                            \
            for (i_ = 0; i_ < n_; i_++)                                 \
                t_[i_] = s_[i_];                                        \
    } while (0)
#else
#define Py_MEMCPY memcpy
#endif
History
Date User Action Args
2018-04-17 23:39:13gitinit.py@gmail.comsetnosy: + gitinit.py@gmail.com
messages: + msg315427
2017-03-31 16:36:18dstufftsetpull_requests: + pull_request924
2016-09-14 15:19:37steve.dowersetmessages: + msg276464
2016-09-14 13:15:19christian.heimessetmessages: + msg276436
2016-09-14 13:12:50vstinnersetmessages: + msg276435
2016-09-14 13:07:57serhiy.storchakasetmessages: + msg276433
2016-09-14 12:50:16christian.heimessetmessages: + msg276426
2016-09-14 10:37:18serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg276413
2016-09-13 18:23:10christian.heimessetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2016-09-13 18:22:44python-devsetnosy: + python-dev
messages: + msg276317
2016-09-13 18:15:36christian.heimessetmessages: + msg276314
2016-09-13 18:11:51vstinnersetmessages: + msg276313
2016-09-13 17:44:16christian.heimessetmessages: + msg276310
2016-09-13 17:25:32steve.dowersetmessages: + msg276309
2016-09-13 17:19:38christian.heimessetmessages: + msg276306
2016-09-13 13:45:18steve.dowersetmessages: + msg276271
2016-09-13 12:32:23christian.heimescreate