> Adding yet another API to allocate memory has a cost

Please don't FUD this one to death.  Aligned memory access is sometimes important and we currently have no straight-forward way to achieve it.  If you're truly worried about adding single new function to the public C API, we can create  just a single internal function:  void *PyMem_RawMallocAligned(size_t size, size_t alignment).

> aligning every data structure on a cacheline boundary 
> doesn't sound like a very good idea

We don't have to align EVERY data structure.  But I do have immediate beneficial use cases for set tables and for data blocks in deque objects.  I need this function and would appreciate your help in fitting it in nicely with the current memory management functions and macros.
