classification
Title: obmalloc's 8-byte alignment causes undefined behavior
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.9, Python 3.8, Python 3.7, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, fweimer, gregory.p.smith, inada.naoki, nascheme, pitrou, skrah, tgrigg, twouters, vstinner
Priority: normal Keywords: patch

Created on 2016-09-07 00:17 by benjamin.peterson, last changed 2019-04-16 10:20 by vstinner.

Files
File name Uploaded Description Edit
alignment.patch benjamin.peterson, 2016-09-07 00:17 review
Pull Requests
URL Status Linked Edit
PR 12850 open inada.naoki, 2019-04-16 01:54
Messages (22)
msg274688 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 00:17
ubsan complains about unaligned access when structs include "long double". An example error:
    runtime error: member access within misaligned address 0x7f77dbba9798 for type 'struct CDataObject', which requires 16 byte alignment

This is because (on x86 anyway), long double is 16-bytes long and requires that alignment, but obmalloc only gives a 8-byte alignment. (glibc malloc() gives 16-byte alignment.)

I'm attaching a POC patch. I don't know what the impact of increasing the alignment is on obmalloc's performance or memory usage. It's also unfortunate that this patch increases the size of PyGC_Head to 32 bytes from 24 bytes. One can imagine a more middle-ground solution to this by allowing types to specify their required alignment.
msg304873 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 08:48
What do we do if at some point a C type requires a larger alignment (for example a vector type readable using AVX512 instructions)?
msg304874 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 09:00
> ubsan complains about unaligned access when structs include "long double". An example error:
> runtime error: member access within misaligned address 0x7f77dbba9798 for type 'struct CDataObject', which requires 16 byte alignment

Can we use memcpy() to prevent such issue?
msg304912 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-10-24 14:41
My suggestion would be to pass alignof(type) into the allocator via macro. Then the allocator could at least assert it's providing good enough alignment if not provide the correct alignment.

I believe 16-byte alignment is special because it's glibc's malloc's default. So "normal" code shouldn't really be expecting anything better than 16-byte alignment. Code with higher alignment requirements will have to use APIs like the one proposed in #18835.
msg304917 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 15:18
> My suggestion would be to pass alignof(type) into the allocator via macro.

Do you mean using some new PyMem_ function?  Or as as new tp_ field on the type declaration?
msg304918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 15:33
What matters when a Python object is allocated? The start of the PyObject structure, or the start of the PyGC_Head structure? Would it be possible to align the PyObject start?

The simplest option is to store data which needs to be aligned in a second memory block allocated by PyMem_AlignedAlloc().
msg304919 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 15:35
Change by Antoine Pitrou: "versions:  -Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6"

The undefined behaviour exists and should be fixed in Python 2.7 and 3.6, no? Can we use memcpy()?
msg304922 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 16:02
> Can we use memcpy()?

Hmm, perhaps.  Do you want to try it out (and measure any performance degradation)?
msg304925 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-10-24 16:24
Since we have "#define PYMEM_FUNCS PYOBJ_FUNCS", I think extensions that
use PyMem_Malloc() also won't get the glibc max_align_t alignment.

But guess technically they should.
msg304959 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-10-25 05:36
Yes, we could memcpy things around to obtain the desired alignment. It would be nicer to have a builtin solution, though.
msg305847 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-08 14:14
alignment.patch: +    long double dummy;  /* force worst-case alignment */

Would it be possible to use max_align_t mentioned by Stefan, at least when this type is available?

What is the impact of the patch on objects size?
msg305939 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-11-09 06:25
On Wed, Nov 8, 2017, at 06:14, STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@gmail.com> added the comment:
> 
> alignment.patch: +    long double dummy;  /* force worst-case alignment
> */
> 
> Would it be possible to use max_align_t mentioned by Stefan, at least
> when this type is available?

Yes, that would be the correct thing to do. I was looking for the quick
hack.

> 
> What is the impact of the patch on objects size?

On 64-bit platforms, I believe it wastes a word for GC objects.
msg305985 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-11-09 21:35
FYI, this would seem to be an incentive to get my "bitmaps for small GC objects" idea implemented.  I.e.

https://mail.python.org/pipermail/python-dev/2017-September/149307.html

If implemented, the extra size of the PyGC_Head would only apply to "large" objects.  In my prototype, I'm thinking of using 512 bytes as the size limit for small GC objects.
msg311323 - (view) Author: Florian Weimer (fweimer) Date: 2018-01-31 10:57
This bug causes miscompilation of Python 2.7 by GCC 8 on x86-64 (with no sanitizers enabled, just compiler optimization).

I think this is a fairly conservative way for papering over the issue:

https://mail.python.org/pipermail/python-dev/2018-January/152011.html
msg340120 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-12 22:32
While this issue looked purely theorical to me 3 years ago, it is now very concrete: bpo-36618 "clang expects memory aligned on 16 bytes, but pymalloc aligns to 8 bytes".
msg340179 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-14 03:15
Now PyGC_Head is 16byte on 64bit platform.
Maybe, should we just change obmalloc in Python 3.8?

How about 32bit platforms?
What can we do for Python 3.7 and 2.7?
msg340254 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-15 10:17
PyGC_Head structure size depends on the Python version, sizes of 64-bit:

* 2.7: 32 bytes
* 3.6, 3.7: 24 bytes
* 3.8 (master): 16 bytes

bpo-36618 "clang expects memory aligned on 16 bytes, but pymalloc aligns to 8 bytes" should be even worse on 3.7: 24 is not aligned on 16. I don't understand why nobody saw this alignment issue previously. Maybe clang only became stricer about 16 bytes alignment recently?

2.7:

typedef union _gc_head {
    struct {
        union _gc_head *gc_next;
        union _gc_head *gc_prev;
        Py_ssize_t gc_refs;
    } gc;
    double dummy; /* Force at least 8-byte alignment. */
    char dummy_padding[sizeof(union _gc_head_old)];
} PyGC_Head;

3.7:

typedef union _gc_head {
    struct {
        union _gc_head *gc_next;
        union _gc_head *gc_prev;
        Py_ssize_t gc_refs;
    } gc;
    double dummy;  /* force worst-case alignment */
} PyGC_Head;

3.8:

typedef struct {
    // Pointer to next object in the list.
    // 0 means the object is not tracked
    uintptr_t _gc_next;

    // Pointer to previous object in the list.
    // Lowest two bits are used for flags documented later.
    uintptr_t _gc_prev;
} PyGC_Head;

In 3.8, the union used to ensure alignment on a C double is gone.
msg340264 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-15 11:21
I had not noticed bpo-33374 changed Python 2.7.
I don't know why it caused segv only for Python 2.7.
msg340265 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-15 11:24
The x86-64 ABI requires that memory allocated on the heap is aligned to 16 bytes.

On x86-64, glibc malloc(size) aligns on 16 bytes for size >= 16, otherwise align to 8 bytes. So the glibc doesn't respect exactly the ABI. I understand that a compiler will not use instructions which require 16B align on a memory block smaller than 16B, so align to 8B for size < 16B should be fine *in practice*.

Python objects are at least 16B because of PyObject header. Moreover, objects tracked by the GC gets additional 16B header from PyGC_Head.

But pymalloc is also used for PyMem_Malloc() since Python 3.6, and PyMem_Malloc() is used to allocate things which are not PyObject.
msg340266 - (view) Author: Florian Weimer (fweimer) Date: 2019-04-15 11:40
Minor correction: glibc malloc follows ABI on x86-64 and always returns a 16-byte-aligned pointer, independently of allocation size.

However, other mallocs (such as jemalloc and tcmalloc) may return pointers with less alignment for allocation sizes less than 16 bytes, violating ABI.  They still follow ABI for allocations of 16 bytes and more.

But as you said, the distinction should not matter for Python because of the object header.  Furthermore, without LTO, the compiler will not be able to detect that a pointer returned from Py_NewObject is a top-level allocation, and therefore has to be more conservative about alignment, using information from the type definitions only.
msg340330 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-16 10:14
> In 3.8, the union used to ensure alignment on a C double is gone.

Note that two uintptr_t is aligned 16bytes on 64bit platforms and 8bytes on 32bit platforms.

Python 3.7 is worse than 3.8.
It used "double dummy" to align by 8 bytes, not 16 bytes.
We should use "long double" to align by 16 bytes.
https://software.intel.com/sites/default/files/article/402129/mpx-linux64-abi.pdf

But it means +8 bytes for all tuples.  If we backport PR-12850 to 3.7, +8 bytes for 1/2 tuples, and +16 bytes for remaining tuples.

Any ideas about reduce impact for Python 3.7?
For example, can we add 8byte dummy to PyGC_Head, and tuple use the dummy for hash?  Maybe, it breaks ABI....  Not a chance...

I wonder if we can add -fmax-type-align=8 for extension types...
msg340331 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-04-16 10:20
> I wonder if we can add -fmax-type-align=8 for extension types...

No, we cannot: it's a temporary fix. The flag causes compilation error if it's added to old version of clang or to a C compiler different than clang.


> Any ideas about reduce impact for Python 3.7?

I don't think that it's a matter of performance here. What matters the most here is correctness.

See Florian Weimer's message:
https://bugs.python.org/issue36618#msg340261

"This issue potentially affects all compilers, not just Clang."
History
Date User Action Args
2019-04-16 10:20:56vstinnersetmessages: + msg340331
2019-04-16 10:14:54inada.naokisetmessages: + msg340330
2019-04-16 01:54:37inada.naokisetstage: patch review
pull_requests: + pull_request12775
2019-04-15 11:40:27fweimersetmessages: + msg340266
2019-04-15 11:24:14vstinnersetmessages: + msg340265
2019-04-15 11:21:31inada.naokisetmessages: + msg340264
2019-04-15 10:17:06vstinnersetmessages: + msg340254
2019-04-14 03:15:46inada.naokisetmessages: + msg340179
2019-04-13 01:08:48inada.naokisetnosy: + inada.naoki
2019-04-12 22:40:53gregory.p.smithsetversions: + Python 3.8, Python 3.9
2019-04-12 22:32:05vstinnersetmessages: + msg340120
2018-04-27 21:30:22tgriggsetnosy: + tgrigg
2018-01-31 10:57:24fweimersetversions: + Python 2.7
2018-01-31 10:57:18fweimersetnosy: + fweimer
messages: + msg311323
2018-01-31 00:12:39gregory.p.smithsetnosy: + twouters, gregory.p.smith
2017-11-09 21:35:54naschemesetnosy: + nascheme
messages: + msg305985
2017-11-09 06:25:35benjamin.petersonsetmessages: + msg305939
2017-11-08 14:14:26vstinnersetmessages: + msg305847
2017-10-25 05:36:39benjamin.petersonsetmessages: + msg304959
2017-10-24 16:24:36skrahsetnosy: + skrah
messages: + msg304925
2017-10-24 16:02:43pitrousetmessages: + msg304922
2017-10-24 15:35:23vstinnersetmessages: + msg304919
2017-10-24 15:33:57vstinnersetmessages: + msg304918
2017-10-24 15:19:04pitrousetversions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6
2017-10-24 15:18:43pitrousetmessages: + msg304917
2017-10-24 14:41:45benjamin.petersonsetmessages: + msg304912
2017-10-24 09:00:09vstinnersetnosy: + vstinner
messages: + msg304874
2017-10-24 08:48:14pitrousetnosy: + pitrou
messages: + msg304873
2016-09-07 00:17:10benjamin.petersoncreate