Benchmarks and Intel's recommendation show that aligned allocation is actually important for AVX performance, and NumPy depends on CPython providing the right allocation APIs (for integration with tracemalloc):

So I think for 3.5 we should start providing the APIs. Whether we use them in Python core is another discussion.

Nathaniel, what APIs would you need exactly? See Victor's proposal in 
