Title: raw debug allocators to not return malloc alignment
Type: Stage:
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, jtaylor, mapreri, vstinner
Priority: normal Keywords:

Created on 2017-04-23 22:01 by jtaylor, last changed 2017-05-24 07:31 by jtaylor.

Messages (7)
msg292187 - (view) Author: Julian Taylor (jtaylor) Date: 2017-04-23 22:01
The debug raw allocator do not return the same alignment as malloc. See  _PyMem_DebugRawAlloc:

The line
return p + 2*SST

adds 2 * sizeof(size_t) to the pointer returned by malloc.
On for example x32 malloc returns 16 byte aligned memory but size_t is 4 bytes.
This makes all memory returned by the debug allocators not aligned the what the system assumes on such platforms.
msg292256 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2017-04-25 08:59
How it cause problem?
I think you should use `malloc()` instead of `PyMem_Malloc()` or other Python memory allocator when you need strict `malloc()` alignment.
msg292257 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-25 09:05
> On for example x32 malloc returns 16 byte aligned memory but size_t is 4 bytes.

x32 is a strange platform :-( Does numpy support it? I'm not sure that Python works on such platform.

I suggest to hardcode 16 or 32 bytes in _PyMem_DebugRawAlloc instead of relying on sizeof(size_t). pymalloc aligns memory allocations to 8 bytes if I recall correctly.

> How it cause problem?

numpy uses SIMD instructions which require strict memory alignement.

Note: There was also an issue #18835 to "Add aligned memory variants to the suite of PyMem functions/macros", but it was rejected.
msg292258 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-25 09:06
Is this issue related to this numpy issue: "ENH: add support for python3.6 memory tracing"?
msg294243 - (view) Author: Julian Taylor (jtaylor) Date: 2017-05-23 10:03
no in numpy it is just a case of using the wrong allocator in a certain spot, an issue that can be fixed in numpy.
But it is also minor bug/documentation issue in Python itself.

Alignment isn't very important for SIMD any more but there are architectures where alignment is still mandatory so numpy is sprinkled with asserts checking alignment which triggered on x32.
It is a very minor issue as to my knowledge none of the platforms with alignment requirement has the properties of x32 and x32 doesn't actually care about alignment either.
msg294314 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-05-24 04:19
Maybe we can use Py_MAX (sizeof (size_t), 8) for SST? If I recall correctly, pymalloc uses 8 bytes for the alignement.

Somewhere I read that malloc uses sizeof (double), the largest C type.

Well, what do you suggest Julian? Do you want to write a PR?
msg294333 - (view) Author: Julian Taylor (jtaylor) Date: 2017-05-24 07:31
The largest type is usually the long double. Its alignment ranges from 4 bytes (i386) to 16 bytes (sparc).
So Py_MAX (sizeof (size_t), 8) should indeed do it.
Date User Action Args
2017-05-24 07:31:37jtaylorsetmessages: + msg294333
2017-05-24 04:19:57vstinnersetmessages: + msg294314
2017-05-23 10:03:58jtaylorsetmessages: + msg294243
2017-04-25 09:06:44vstinnersetmessages: + msg292258
2017-04-25 09:05:45vstinnersetnosy: + vstinner
messages: + msg292257
2017-04-25 08:59:04inada.naokisetnosy: + inada.naoki
messages: + msg292256
2017-04-24 10:07:39maprerisetnosy: + mapreri
2017-04-23 22:01:08jtaylorcreate