About the API itself, I'm not sure that PyMem_AlignedAlloc(alignment, size) is flexible enough. If we want to get *data* aligned in a Python object, we would have to pass an offset to the data, since Python objects have headers of variable size (depending on the type).

Windows has such API:

void * _aligned_offset_malloc(  
   size_t size,   
   size_t alignment,   
   size_t offset  

This function is based on malloc, so likely adds padding bytes for you depending on size, alignment and offset.

See bpo-27987: "obmalloc's 8-byte alignment causes undefined behavior".
