classification
Title: clang expects memory aligned on 16 bytes, but pymalloc aligns to 8 bytes
Type: Stage: resolved
Components: Build Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: fweimer, gregory.p.smith, pablogsal, serge-sans-paille, vstinner
Priority: normal Keywords: patch

Created on 2019-04-12 17:36 by vstinner, last changed 2019-05-14 17:31 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12809 merged vstinner, 2019-04-12 17:52
PR 12811 merged vstinner, 2019-04-12 22:20
PR 13320 merged vstinner, 2019-05-14 16:46
Messages (17)
msg340087 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 17:36
On x86-64, clang -O3 compiles the following function:

PyCArgObject *
PyCArgObject_new(void)
{
    PyCArgObject *p;
    p = PyObject_New(PyCArgObject, &PyCArg_Type);
    if (p == NULL)
        return NULL;
    p->pffi_type = NULL;
    p->tag = '\0';
    p->obj = NULL;
    memset(&p->value, 0, sizeof(p->value));
    return p;
}

like that:

   0x00007fffe9c6acb0 <+0>:	push   rax
   0x00007fffe9c6acb1 <+1>:	mov    rdi,QWORD PTR [rip+0xe308]        # 0x7fffe9c78fc0
   0x00007fffe9c6acb8 <+8>:	call   0x7fffe9c5e8a0 <_PyObject_New@plt>
   0x00007fffe9c6acbd <+13>:	test   rax,rax
   0x00007fffe9c6acc0 <+16>:	je     0x7fffe9c6acdf <PyCArgObject_new+47>
   0x00007fffe9c6acc2 <+18>:	mov    QWORD PTR [rax+0x20],0x0
   0x00007fffe9c6acca <+26>:	mov    BYTE PTR [rax+0x28],0x0
   0x00007fffe9c6acce <+30>:	xorps  xmm0,xmm0
   0x00007fffe9c6acd1 <+33>:	movaps XMMWORD PTR [rax+0x30],xmm0
   0x00007fffe9c6acd5 <+37>:	mov    QWORD PTR [rax+0x40],0x0
   0x00007fffe9c6acdd <+45>:	pop    rcx
   0x00007fffe9c6acde <+46>:	ret    
   0x00007fffe9c6acdf <+47>:	xor    eax,eax
   0x00007fffe9c6ace1 <+49>:	pop    rcx
   0x00007fffe9c6ace2 <+50>:	ret    

The problem is that movaps requires the memory address to be aligned on 16 bytes, whereas PyObject_New() uses pymalloc allocator (the requested size is 80 bytes, pymalloc supports allocations up to 512 bytes) and pymalloc only provides alignment on 8 bytes.

If PyObject_New() returns an address not aligned on 16 bytes, PyCArgObject_new() crash immediately with a segmentation fault (SIGSEGV).

CPython must be compiled using -fmax-type-align=8 to avoid such alignment crash. Using this compiler flag, clag emits expected machine code:

   0x00007fffe9caacb0 <+0>:	push   rax
   0x00007fffe9caacb1 <+1>:	mov    rdi,QWORD PTR [rip+0xe308]        # 0x7fffe9cb8fc0
   0x00007fffe9caacb8 <+8>:	call   0x7fffe9c9e8a0 <_PyObject_New@plt>
   0x00007fffe9caacbd <+13>:	test   rax,rax
   0x00007fffe9caacc0 <+16>:	je     0x7fffe9caacdf <PyCArgObject_new+47>
   0x00007fffe9caacc2 <+18>:	mov    QWORD PTR [rax+0x20],0x0
   0x00007fffe9caacca <+26>:	mov    BYTE PTR [rax+0x28],0x0
   0x00007fffe9caacce <+30>:	xorps  xmm0,xmm0
   0x00007fffe9caacd1 <+33>:	movups XMMWORD PTR [rax+0x30],xmm0
   0x00007fffe9caacd5 <+37>:	mov    QWORD PTR [rax+0x40],0x0
   0x00007fffe9caacdd <+45>:	pop    rcx
   0x00007fffe9caacde <+46>:	ret    
   0x00007fffe9caacdf <+47>:	xor    eax,eax
   0x00007fffe9caace1 <+49>:	pop    rcx
   0x00007fffe9caace2 <+50>:	ret    

"movaps" instruction becomes "movups" instruction: "a" stands for "aligned" in movaps, whereas "u" stands for "unaligned" in movups.
msg340100 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 19:27
New changeset 23a683adf803eef405d248cc9c2a7eb08a7300e2 by Victor Stinner in branch 'master':
bpo-36618: Add -fmax-type-align=8 flag for clang (GH-12809)
https://github.com/python/cpython/commit/23a683adf803eef405d248cc9c2a7eb08a7300e2
msg340106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 20:07
I merged a "workaround" in the master branch. Python 2.7 and 3.7 are also affected, but I prefer to wait to see if the change goes through buildbots.

The real fix would be to modify pymalloc to use 16-byte alignement, but that's a more complex issue :-)
msg340108 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 20:10
Note to myself: "Sadly, the flag must be expected to CFLAGS and not just CFLAGS_NODIST, ..."

It should be "Sadly, the flag must be *set* to CFLAGS and not just CFLAGS_NODIST, ..." :-( I should fix the NEWS entry.
msg340116 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 22:01
Oh, it seems like the change broke the FreeBSD 10 buildbot :-(

https://buildbot.python.org/all/#/builders/167/builds/769

...
checking for makedev... no
checking for le64toh... no
checking for mode_t... no
checking for off_t... no
checking for pid_t... no
checking for size_t... no
checking for uid_t in sys/types.h... yes
checking for ssize_t... no
checking for __uint128_t... no
checking size of int... 0
checking size of long... 0
checking size of long long... 0
checking size of void *... 0
checking size of short... 0
checking size of float... 0
checking size of double... 0
checking size of fpos_t... 0
checking size of size_t... 0
checking size of pid_t... 0
checking size of uintptr_t... 0
checking for long double... yes
configure: error: in `/usr/home/buildbot/python/3.x.koobs-freebsd10.nondebug/build':
configure: error: cannot compute sizeof (long double)
See `config.log' for more details

--

On the previous build, the output looked fine:

https://buildbot.python.org/all/#/builders/167/builds/768

...
checking size of int... 4
checking size of long... 8
checking size of long long... 8
checking size of void *... 8
checking size of short... 2
checking size of float... 4
checking size of double... 8
checking size of fpos_t... 8
checking size of size_t... 8
checking size of pid_t... 4
checking size of uintptr_t... 8
checking for long double... yes
checking size of long double... 16

pythoninfo:

CC.version: FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
os.uname: posix.uname_result(sysname='FreeBSD', nodename='10-STABLE-amd64.elysium', release='10.4-STABLE', version='FreeBSD 10.4-STABLE #1 r337021: Wed Aug  1 15:12:48 AEST 2018     root@10-STABLE-amd64.elysium:/usr/obj/usr/src/sys/GENERIC', machine='amd64')
platform.platform: FreeBSD-10.4-STABLE-amd64-64bit-ELF
msg340126 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 22:37
See also:

* bpo-27987: obmalloc's 8-byte alignment causes undefined behavior
* bpo-18835: Add PyMem_AlignedAlloc()
* bpo-31912: PyMem_Malloc() should guarantee alignof(max_align_t)
msg340128 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-12 22:49
Even if you check for -fmax-type-align compiler support at configure time, there is a potential problem:

Nothing guarantees that extension modules are built by the same compiler that CPython is.  If CPython used an old clang without support for that flag and the extension module compiled by that CPython via pip and setup.py, etc. uses a more recent version of clang - it wouldn't specify that flag and the extension module code could be broken.

I suppose this issue of conditional compiler flags is nothing new.  It should not block us from going forward with a workaround like your PRs for now.
msg340129 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2019-04-12 22:50
I believe -fno-max-type-align is also an option.
msg340130 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 22:51
New changeset a304b136adda3575898d8b5debedcd48d5072272 by Victor Stinner in branch 'master':
bpo-36618: Don't add -fmax-type-align flag to old clang (GH-12811)
https://github.com/python/cpython/commit/a304b136adda3575898d8b5debedcd48d5072272
msg340131 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 22:55
> It should not block us from going forward with a workaround like your PRs for now.

I pushed a fix quickly to unblock my PR 12796, but also because I was very scared by what I saw :-D

I see my change as a "quick fix", but we really have to sit down to think about the "correct fix". Especially since we will have to do something for Python 2.7 and 3.7, and adding -fmax-type-align=8 to exported CFLAGS can cause *new* issues, as you explained.

That's why I mentioned bpo-27987 and other issues, to try to see what has already been said, and try to find to identify and fix the root issue, rather than working around the issue with compiler flags.
msg340261 - (view) Author: Florian Weimer (fweimer) Date: 2019-04-15 11:02
The issue is related to the definition of PyCArgObject:

typedef struct tagPyCArgObject PyCArgObject;

struct tagPyCArgObject {
    PyObject_HEAD
    ffi_type *pffi_type;
    char tag;
    union {
        char c;
        char b;
        short h;
        int i;
        long l;
        long long q;
        long double D;
        double d;
        float f;
        void *p;
    } value;
    PyObject *obj;
    Py_ssize_t size; /* for the 'V' tag */
};

This object must be allocated with suitable alignment (which is 16 on many platforms), and the default Python allocator apparently provides 8-byte alignment only on 64-bit platforms.  In short, using PyObject_New with PyCArgObject results in undefined behavior.

This issue potentially affects all compilers, not just Clang.
msg340279 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-15 14:45
C++ has a __alignof__ function/macro/operator (not sure what is its kind) to get the alignment of a type.

C11 has <stdalign.h> header which provides an alignof() function.

GCC has __alignof__().

I also found "_Alignof()" name.

... no sure which one is the most portable ...
msg340280 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-15 14:47
I also found this macro:

#define ALIGNOF(type) offsetof (struct { char c; type member; }, member)
msg340281 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-15 14:47
More info on alignof():
http://www.wambold.com/Martin/writings/alignof.html
msg340322 - (view) Author: (serge-sans-paille) * Date: 2019-04-16 09:19
@vstinner: once you have a portable version of alignof, you can deciding to *not* use the pool allocator if the required alignment is greater than 8B, or you could modify the pool allocator to take alignment information as an extra parameter?
msg342490 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 17:29
New changeset d97adfb409290a1e4ad549e4af58cacea86d3358 by Victor Stinner in branch 'master':
bpo-36618: Don't add -fmax-type-align=8 flag for clang (GH-13320)
https://github.com/python/cpython/commit/d97adfb409290a1e4ad549e4af58cacea86d3358
msg342491 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-05-14 17:31
Python 3.8 now respects the x86-64 ABI: https://bugs.python.org/issue27987

I reverted my workaround.
History
Date User Action Args
2019-05-14 17:31:20vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg342491

stage: patch review -> resolved
2019-05-14 17:29:55vstinnersetmessages: + msg342490
2019-05-14 16:46:51vstinnersetpull_requests: + pull_request13231
2019-04-16 09:19:35serge-sans-paillesetnosy: + serge-sans-paille
messages: + msg340322
2019-04-15 14:47:56vstinnersetmessages: + msg340281
2019-04-15 14:47:00vstinnersetmessages: + msg340280
2019-04-15 14:45:19vstinnersetmessages: + msg340279
2019-04-15 11:02:44fweimersetmessages: + msg340261
2019-04-15 10:50:11fweimersetnosy: + fweimer
2019-04-12 22:55:11vstinnersetmessages: + msg340131
2019-04-12 22:51:09vstinnersetmessages: + msg340130
2019-04-12 22:50:25gregory.p.smithsetmessages: + msg340129
2019-04-12 22:49:20gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg340128
2019-04-12 22:37:48vstinnersetmessages: + msg340126
2019-04-12 22:20:10vstinnersetpull_requests: + pull_request12736
2019-04-12 22:01:30vstinnersetmessages: + msg340116
2019-04-12 20:10:58vstinnersetmessages: + msg340108
2019-04-12 20:07:12vstinnersetmessages: + msg340106
2019-04-12 19:27:49vstinnersetmessages: + msg340100
2019-04-12 17:52:43vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12735
2019-04-12 17:42:18vstinnersetnosy: + pablogsal
2019-04-12 17:36:22vstinnercreate