Message 262172 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	njs, pitrou, python-dev, vstinner
Date	2016-03-22.10:18:29
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAMpsgwb43UHfzA8c6_Zm2yyVTRHa3A4yhwp9XrdWGWWyuky3pQ@mail.gmail.com>
In-reply-to	<1458605798.38.0.490231703163.issue26588@psf.upfronthosting.co.za>

Content
Hi, > - How do you anticipate the integers naming domains will be allocated? Is it like port numbers, and you'll maintain a registry somewhere ("domains 0-100 are reserved for the interpreter, pycuda has reserved 100-110, ...")? I simply have no idea at all :-) I guess that the best is to collaborate. It's not like we plan to have 1000 different users of the API. Today, I only know two users: you and Antoine :-D unsigned int is large: at least 32 bits. I understood that we will only need something like 10 or maybe 20 domains. > - You have it configurable at run-time whether the domain gets included in the trace key, and if this is set to false (the default) then all the different domains just get collapsed together. How is some downstream user like pycuda expected to get sensible behavior out of this? Is the idea that pycuda should be guarding all trace/untrace calls with if (use_domain) { ... }, or...? In my first implementation, support for domains was always enabled. But I don't like that since it increases the memory footprint even if you don't use domains, which is the case for all users today. I added a flag, but I'm not happy with the API. For example, it's not possible to set the flag using PYTHONTRACEMALLOC=nframe env var nor -X tracemalloc[=nframe] command line option, whereas I'm a big fan of -X tracemalloc=5 command line option. Instead of having to enable the flag early and don't allow to change the flag later, I should try to implement a dynamic flag: start without domain, and convert the whole table to support domain as soon as you start to trace at least one trace with domain != 0. I pushed the first part of my patch (support key bigger than sizeof(void)) to be able to focus on the second part. If fact, converting a table is no more complex than the regular "rehash" operation when the table becomes too big. This operation is part of the hashtable design. > (I guess it could be sensible to support disabling domain stuff -- though if I were writing it I probably wouldn't bother just because it increases the number of configurations that need to be tested etc. -- but if you're doing this then IMO it should discard all trace/untrace calls that refer to non-default domains, i.e. domain=False shouldn't mean "discard domain information", it should mean "trace only the heap domain".) The design of tracemalloc is more to collect everything, but filter later. That's why I added an optional domain parameter to tracemalloc.Filter an a new tracemalloc.DomainFilter filter. > - Is all this messing about with possibly-unaligned pointers etc. really easier than having one-hashtable-per-domain? You could easily get away with a linearly scanned array, even... Having one hash table per domain requires to allocate a new hashtable when we start to use a domain, then decide when it's time to release it. We also need a "registry" for all trace hashtables: a simple short array, or maybe even another hashtable? But I think that we disagree on how traces will used. It looks like you only care of statistics per domain. IMHO it's worth to get the total* memory (ex: CPU+GPU memory) per filename or per line number. But also give the ability to filter domains. For example, exclude CPU to get statistics of all GPU domains, or get CPU+shmem statistics. I know that technically, the storage doesn't matter much: we can still get traces of all domains at once. I like the idea of having all traces in the same structure, even I have to agree that the new C macros to get a key are more ugly than the previous simple direct "void* key" pointer. Domains are quite strange. They are part of the key in the hashtable, but _tracemalloc._get_traces() now returns them as part of the "value" (with my patch) :-)

Hi,

> - How do you anticipate the integers naming domains will be allocated? Is it like port numbers, and you'll maintain a registry somewhere ("domains 0-100 are reserved for the interpreter, pycuda has reserved 100-110, ...")?

I simply have no idea at all :-) I guess that the best is to
collaborate. It's not like we plan to have 1000 different users of the
API. Today, I only know two users: you and Antoine :-D

unsigned int is large: at least 32 bits. I understood that we will
only need something like 10 or maybe 20 domains.

> - You have it configurable at run-time whether the domain gets included in the trace key, and if this is set to false (the default) then all the different domains just get collapsed together. How is some downstream user like pycuda expected to get sensible behavior out of this? Is the idea that pycuda should be guarding all trace/untrace calls with if (use_domain) { ... }, or...?

In my first implementation, support for domains was always enabled.
But I don't like that since it increases the memory footprint even if
you don't use domains, which is the case for *all* users today.

I added a flag, but I'm not happy with the API. For example, it's not
possible to set the flag using PYTHONTRACEMALLOC=nframe env var nor -X
tracemalloc[=nframe] command line option, whereas I'm a big fan of -X
tracemalloc=5 command line option.

Instead of having to enable the flag early and don't allow to change
the flag later, I should try to implement a dynamic flag: start
without domain, and convert the whole table to support domain as soon
as you start to trace at least one trace with domain != 0. I pushed
the first part of my patch (support key bigger than sizeof(void*)) to
be able to focus on the second part.

If fact, converting a table is no more complex than the regular
"rehash" operation when the table becomes too big. This operation is
part of the hashtable design.

> (I guess it could be sensible to support disabling domain stuff -- though if I were writing it I probably wouldn't bother just because it increases the number of configurations that need to be tested etc. -- but if you're doing this then IMO it should discard all trace/untrace calls that refer to non-default domains, i.e. domain=False shouldn't mean "discard domain information", it should mean "trace only the heap domain".)

The design of tracemalloc is more to collect everything, but filter
later. That's why I added an optional domain parameter to
tracemalloc.Filter an a new tracemalloc.DomainFilter filter.

> - Is all this messing about with possibly-unaligned pointers etc. really easier than having one-hashtable-per-domain? You could easily get away with a linearly scanned array, even...

Having one hash table per domain requires to allocate a new hashtable
when we start to use a domain, then decide when it's time to release
it. We also need a "registry" for all trace hashtables: a simple short
array, or maybe even another hashtable?

But I think that we disagree on how traces will used. It looks like
you only care of statistics *per domain*. IMHO it's worth to get the
*total* memory (ex: CPU+GPU memory) per filename or per line number.
But also give the ability to filter domains. For example, exclude CPU
to get statistics of all GPU domains, or get CPU+shmem statistics.

I know that technically, the storage doesn't matter much: we can still
get traces of all domains at once. I like the idea of having all
traces in the same structure, even I have to agree that the new C
macros to get a key are more ugly than the previous simple direct
"void* key" pointer.

Domains are quite strange. They are part of the key in the hashtable,
but _tracemalloc._get_traces() now returns them as part of the "value"
(with my patch) :-)

History
Date	User	Action	Args
2016-03-22 10:18:30	vstinner	set	recipients: + vstinner, pitrou, njs, python-dev
2016-03-22 10:18:30	vstinner	link	issue26588 messages
2016-03-22 10:18:29	vstinner	create