Message 261905 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	njs, pitrou, vstinner
Date	2016-03-17.11:00:00
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1458212401.23.0.101165857424.issue26530@psf.upfronthosting.co.za>
In-reply-to

Content
Nathaniel Smith: > So PyMem_Malloc would just call PyMem_RecordAlloc("heap", ptr, size) (or act equivalently to something that called that, etc.), but something like PyCuda might do PyMem_RecordAlloc("gpu", ptr, size) to track allocations in GPU memory. If I change tracemalloc, it's not to fullfit numpy requirements, it must remain very generic. If we add something, I see 3 choices: * add a C int to trace_t * add a C char* to trace_t * add a C void* to trace_t int uses 2x less memory than char* or void* on 64-bit systems. The problem of void* is to find a way to expose it in Python. An option is not ignore it in the Python API, and only provide a C API to retrieve traces with the extra info. void* allows to implement the rejected option of also storing the C filename an C line number: https://www.python.org/dev/peps/pep-0445/#pass-the-c-filename-and-line-number When I designed the PEP 445 (malloc API) an PEP 454 (tracemalloc), I recall that it was proposed to add "colors" (red, blue, etc.) to memory allocations. It sounds similar do you "heap" and "gpu" use case. It's just that you use an integer rather than a string. Anyway, extending tracemalloc is non-trivial, we have to investigate use cases to design the new API. I would prefer to move step by step, an begin with exposing existing API. What do you think? > All the tracing stuff in tracemalloc would be awesome to have for GPU allocations, and it would hardly require any changes to enable it. Ditto for other leakable resources like file descriptors or shmem segments. FYI I opened an issue to use tracemalloc when logging ResourceWarning: http://bugs.python.org/issue26567 > I think the extra footprint would be tiny? In my experience, tracemalloc footprint is large. Basically, it doubles the total memory footprint. So adding 4 or 8 bytes to a trace which currently takes 16 bytes is not negligible! Maybe we can be smart and use compact trace when extra info is not stored (current trace_t) and switch to "extended" trace (trace_t + int/char/void) when the extended API is used? It requires to convert all existing traces from the compact to the extende format. It doesn't look too complex to support two formats, expecially if the extended format is based on the compact format (trace_t structure used in extended_trace_t structure). > Logically, you'd index traces by (domain, pointer) instead of (pointer) It's not how tracemalloc is designed. _tracemalloc has a simple design. It's a simple hashtable: pointer => trace. The Python module tracemalloc.py is responsible to group traces: https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.Snapshot.statistics The design is to have a simple and efficient _tracemalloc module, an off-load statistics later. It allows to capture traces on a small and slow device, and then analyze data on a fast compuer with more memory (ex: transfer data by network). The idea is also to limit the overhead of using _tracemalloc. Moreover, your proposed structure looks specific to your use case. I'm not sure that you always want to group by the domain. If domain is a C traceback (filename, line number), you probably don't want to group by traceback, but group by C filename for example.

Nathaniel Smith:
> So PyMem_Malloc would just call PyMem_RecordAlloc("heap", ptr, size) (or act equivalently to something that called that, etc.), but something like PyCuda might do PyMem_RecordAlloc("gpu", ptr, size) to track allocations in GPU memory.

If I change tracemalloc, it's not to fullfit numpy requirements, it must remain very generic. *If* we add something, I see 3 choices:

* add a C int to trace_t
* add a C char* to trace_t
* add a C void* to trace_t

int uses 2x less memory than char* or void* on 64-bit systems.

The problem of void* is to find a way to expose it in Python. An option is not ignore it in the Python API, and only provide a C API to retrieve traces with the extra info.

void* allows to implement the rejected option of also storing the C filename an C line number:
https://www.python.org/dev/peps/pep-0445/#pass-the-c-filename-and-line-number

When I designed the PEP 445 (malloc API) an PEP 454 (tracemalloc), I recall that it was proposed to add "colors" (red, blue, etc.) to memory allocations. It sounds similar do you "heap" and "gpu" use case. It's just that you use an integer rather than a string.

Anyway, extending tracemalloc is non-trivial, we have to investigate use cases to design the new API. I would prefer to move step by step, an begin with exposing existing API. What do you think?


> All the tracing stuff in tracemalloc would be awesome to have for GPU allocations, and it would hardly require any changes to enable it. Ditto for other leakable resources like file descriptors or shmem segments.

FYI I opened an issue to use tracemalloc when logging ResourceWarning:
http://bugs.python.org/issue26567


> I think the extra footprint would be tiny?

In my experience, tracemalloc footprint is large. Basically, it doubles the total memory footprint. So adding 4 or 8 bytes to a trace which currently takes 16 bytes is not negligible!

Maybe we can be smart and use compact trace when extra info is not stored (current trace_t) and switch to "extended" trace (trace_t + int/char*/void*) when the extended API is used? It requires to convert all existing traces from the compact to the extende format. It doesn't look too complex to support two formats, expecially if the extended format is based on the compact format (trace_t structure used in extended_trace_t structure).


> Logically, you'd index traces by (domain, pointer) instead of (pointer)

It's not how tracemalloc is designed. _tracemalloc has a simple design. It's a simple hashtable: pointer => trace. The Python module tracemalloc.py is responsible to group traces:
https://docs.python.org/dev/library/tracemalloc.html#tracemalloc.Snapshot.statistics

The design is to have a simple and efficient _tracemalloc module, an off-load statistics later. It allows to capture traces on a small and slow device, and then analyze data on a fast compuer with more memory (ex: transfer data by network). The idea is also to limit the overhead of using _tracemalloc.

Moreover, your proposed structure looks specific to your use case. I'm not sure that you always want to group by the domain. If domain is a C traceback (filename, line number), you probably don't want to group by traceback, but group by C filename for example.

History
Date	User	Action	Args
2016-03-17 11:00:01	vstinner	set	recipients: + vstinner, pitrou, njs
2016-03-17 11:00:01	vstinner	set	messageid: <1458212401.23.0.101165857424.issue26530@psf.upfronthosting.co.za>
2016-03-17 11:00:01	vstinner	link	issue26530 messages
2016-03-17 11:00:00	vstinner	create