Author njs
Recipients neologix, njs, pitrou, rhettinger, tim.peters, vstinner
Date 2014-12-05.22:30:35
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
It's not terribly difficult to write a crude-but-effective aligned allocator on top of raw malloc:

def aligned_malloc(size, alignment):
    assert alignment < 255
    raw_pointer = (uint8*) malloc(size + alignment)
    shift = alignment - (raw_pointer % alignment)
    assert 0 < shift <= alignment
    aligned_pointer = raw_pointer + shift
    *(aligned_pointer - 1) = shift
    return aligned_pointer

def aligned_free(uint8* pointer):
    shift = *(pointer - 1)
    free(pointer - shift)

But, this fallback and the official Win32 API both disallow the use of plain free() (like Victor points out in msg196834), so we can't just add an aligned_malloc slot to the PyMemAllocator struct. This kind of aligned allocation is effectively its own memory domain.

If native aligned allocation support were added to PyMalloc then it could potentially do better (e.g. by noticing that it already has a block on its freelist with the requested alignment and just returning that instead of overallocating). This might be the ideal solution for Raymond's use case, but I have no idea how much work it would be to mess around with PyMalloc innards.

Numpy doesn't currently use aligned allocation for anything, but we'd like to keep our options open. If we do end up using it in the future then there's a reasonable chance we might want to use it *without* the GIL held (e.g. for allocating temporary buffers inside C loops). OTOH we are also happy to implement the aligned allocation ourselves (either on top of the system APIs or directly) -- we just don't want to lose tracemalloc support when we do.

For numpy's purposes, I think the best approach would be to add a tracemalloc "escape valve", with an interface like:

PyMem_RecordAlloc(const char* domain, void* tag, size_t quantity, 
PyMem_RecordRealloc(const char* domain, void* old_tag, void* new_tag, size_t new_quantity)
PyMem_RecordFree(const char* domain, void* tag)

where the idea is that if after someone allocates memory (or potentially other discrete resources) directly without going through PyMem_*, they could then call these functions to tell tracemalloc what they just did.

This would be useful in a number of cases: in addition to tracking aligned allocations, it would make it possible to re-use the tracemalloc infrastructure to track GPU buffers allocated by CUDA/GPGPU-type code, mmap usage, hugetlbfs usage, etc. Potentially even open file descriptors if one wants to go there (seems pretty useful, actually).
Date User Action Args
2014-12-05 22:30:35njssetrecipients: + njs, tim.peters, rhettinger, pitrou, vstinner, neologix
2014-12-05 22:30:35njssetmessageid: <>
2014-12-05 22:30:35njslinkissue18835 messages
2014-12-05 22:30:35njscreate