classification
Title: obmalloc's 8-byte alignment causes undefined behavior
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, nascheme, pitrou, skrah, vstinner
Priority: normal Keywords: patch

Created on 2016-09-07 00:17 by benjamin.peterson, last changed 2017-11-09 21:35 by nascheme.

Files
File name Uploaded Description Edit
alignment.patch benjamin.peterson, 2016-09-07 00:17 review
Messages (13)
msg274688 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 00:17
ubsan complains about unaligned access when structs include "long double". An example error:
    runtime error: member access within misaligned address 0x7f77dbba9798 for type 'struct CDataObject', which requires 16 byte alignment

This is because (on x86 anyway), long double is 16-bytes long and requires that alignment, but obmalloc only gives a 8-byte alignment. (glibc malloc() gives 16-byte alignment.)

I'm attaching a POC patch. I don't know what the impact of increasing the alignment is on obmalloc's performance or memory usage. It's also unfortunate that this patch increases the size of PyGC_Head to 32 bytes from 24 bytes. One can imagine a more middle-ground solution to this by allowing types to specify their required alignment.
msg304873 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 08:48
What do we do if at some point a C type requires a larger alignment (for example a vector type readable using AVX512 instructions)?
msg304874 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 09:00
> ubsan complains about unaligned access when structs include "long double". An example error:
> runtime error: member access within misaligned address 0x7f77dbba9798 for type 'struct CDataObject', which requires 16 byte alignment

Can we use memcpy() to prevent such issue?
msg304912 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-10-24 14:41
My suggestion would be to pass alignof(type) into the allocator via macro. Then the allocator could at least assert it's providing good enough alignment if not provide the correct alignment.

I believe 16-byte alignment is special because it's glibc's malloc's default. So "normal" code shouldn't really be expecting anything better than 16-byte alignment. Code with higher alignment requirements will have to use APIs like the one proposed in #18835.
msg304917 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 15:18
> My suggestion would be to pass alignof(type) into the allocator via macro.

Do you mean using some new PyMem_ function?  Or as as new tp_ field on the type declaration?
msg304918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 15:33
What matters when a Python object is allocated? The start of the PyObject structure, or the start of the PyGC_Head structure? Would it be possible to align the PyObject start?

The simplest option is to store data which needs to be aligned in a second memory block allocated by PyMem_AlignedAlloc().
msg304919 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-24 15:35
Change by Antoine Pitrou: "versions:  -Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6"

The undefined behaviour exists and should be fixed in Python 2.7 and 3.6, no? Can we use memcpy()?
msg304922 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2017-10-24 16:02
> Can we use memcpy()?

Hmm, perhaps.  Do you want to try it out (and measure any performance degradation)?
msg304925 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2017-10-24 16:24
Since we have "#define PYMEM_FUNCS PYOBJ_FUNCS", I think extensions that
use PyMem_Malloc() also won't get the glibc max_align_t alignment.

But guess technically they should.
msg304959 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-10-25 05:36
Yes, we could memcpy things around to obtain the desired alignment. It would be nicer to have a builtin solution, though.
msg305847 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-11-08 14:14
alignment.patch: +    long double dummy;  /* force worst-case alignment */

Would it be possible to use max_align_t mentioned by Stefan, at least when this type is available?

What is the impact of the patch on objects size?
msg305939 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2017-11-09 06:25
On Wed, Nov 8, 2017, at 06:14, STINNER Victor wrote:
> 
> STINNER Victor <victor.stinner@gmail.com> added the comment:
> 
> alignment.patch: +    long double dummy;  /* force worst-case alignment
> */
> 
> Would it be possible to use max_align_t mentioned by Stefan, at least
> when this type is available?

Yes, that would be the correct thing to do. I was looking for the quick
hack.

> 
> What is the impact of the patch on objects size?

On 64-bit platforms, I believe it wastes a word for GC objects.
msg305985 - (view) Author: Neil Schemenauer (nascheme) * (Python committer) Date: 2017-11-09 21:35
FYI, this would seem to be an incentive to get my "bitmaps for small GC objects" idea implemented.  I.e.

https://mail.python.org/pipermail/python-dev/2017-September/149307.html

If implemented, the extra size of the PyGC_Head would only apply to "large" objects.  In my prototype, I'm thinking of using 512 bytes as the size limit for small GC objects.
History
Date User Action Args
2017-11-09 21:35:54naschemesetnosy: + nascheme
messages: + msg305985
2017-11-09 06:25:35benjamin.petersonsetmessages: + msg305939
2017-11-08 14:14:26vstinnersetmessages: + msg305847
2017-10-25 05:36:39benjamin.petersonsetmessages: + msg304959
2017-10-24 16:24:36skrahsetnosy: + skrah
messages: + msg304925
2017-10-24 16:02:43pitrousetmessages: + msg304922
2017-10-24 15:35:23vstinnersetmessages: + msg304919
2017-10-24 15:33:57vstinnersetmessages: + msg304918
2017-10-24 15:19:04pitrousetversions: - Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6
2017-10-24 15:18:43pitrousetmessages: + msg304917
2017-10-24 14:41:45benjamin.petersonsetmessages: + msg304912
2017-10-24 09:00:09vstinnersetnosy: + vstinner
messages: + msg304874
2017-10-24 08:48:14pitrousetnosy: + pitrou
messages: + msg304873
2016-09-07 00:17:10benjamin.petersoncreate