New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_tracemalloc: add support for multiple address spaces (domains) #70775
Comments
The tracemalloc module uses a hashtable: pointer (void*) => trace. Some embedded devices use multiple address spaces. Each GPU also has its own address space. CUDA, OpenCL and OpenGL should also be seen as different address spaces. In the issue bpo-26530, it was proposed (required?) to support multiple addresses spaces to be able to use tracemalloc in numpy. Attached patch enhances tracemalloc to use (pointer: void*, domain: unsigned int) as key in the hashtable. A (pointer, domain) key is stored in a hashtable entry. In the patch, the domain is hardcoded to 0, but the issue bpo-26530 will add new C functions to track/untrack memory blocks, and this new functions will have a domain parameter. The patch changes how a key is passed to the hashtable API: pass a *pointer* to a key rather than directly the key value. Because of that, the patch is quite long. The patch also removes the unused function _Py_hashtable_hash_int(). _Py_HASHTABLE_ENTRY_DATA() macro now requires the hashtable to get the key size, since the offset of data now depends on the key size. |
Le 18/03/2016 22:39, STINNER Victor a écrit :
Only proposed :-) Numpy itself only works on the CPU, however Nathaniel |
Patch version 2:
Currently, Snapshot._group_by() ignores domain information. I don't know how the domain should be exposed, *if* it should be exposed. By the way, maybe we don't need to export the domain at the Python level at all? (Don't expose it in _tracemalloc._get_traces()). |
Do you mean that Numba requires supporting multiple address spaces to be able to use tracemalloc? Or that supporting multiple address spaces would also to *also* track GPU memory allocations? A friend told me that CUDA, OpenGL and OpenCL can (should?) be seen as 3 different address spaces. These libraries don't expose directly addresses in the GPU address space, but may use handles. |
Patch 3:
|
Hum, on OpenStack it's common for me to work on a patch serie. Since our development workflow doesn't support that, I used a single patch. It's hard to review it, all changes are merged into one big patch :-/ |
Crap. I pushed tracemalloc-3.patch by mistake. I just reverted my mistake. At least, it helped to find first bugs using buildbots: http://buildbot.python.org/all/builders/PPC64%20Fedora%203.x/builds/676/steps/test/logs/stdio [230/400/3] test_tracemalloc Current thread 0x00003fffb0a95a10 (most recent call first): |
Updated patch taking Antoine's comments in account. Instead of filling pointer_t padding with zeros, I changed the size of the key (key_size) in the hashtable to exclude padding. |
What about alignment issues on strict platforms? (SPARC?) |
Which kind of alignment issue do you expect? I added "__attribute__((packed))" to frame_t. I expected SIGBUS because of that (it's even documented in a comment ;-)), but I practice nobdy complained, not even our buildbots. Hum, it looks like we only have one SPARC buildbot and it's offline... _Py_hashtable_get() uses memcpy() to copy the value, so this function should not have any alignment issue. The new code uses pointer dereference, so alignment issues are more likely. How can I test it on Intel CPU? Should I also use memcpy() to retrieve the key value? |
What about compilers that don't support "__attribute__((packed))"? Instead you could use a compare func that compares struct fields... |
Patch version 5:
Antoine:
packed is a small and *optional* optimization on the memory footprint of _tracemalloc structure (reduce tracemalloc.get_tracemalloc_memory()). Using packed can introduce memory alignment issue, but since tracemalloc uses memcpy(), it *should* be fine in practice. Again, since I don't have access to SPARC CPU, I don't know how to test.
Done. |
Hum, the patch is really too hard to review. So I extract to the "key_size" part: hashtable_key_size.patch. I added new macros to get the key, wrapper to memcpy() with an assert to check the key size in debug mode. |
New changeset aca4e9af1ca6 by Victor Stinner in branch 'default': |
New changeset 98a1c8cb882f by Victor Stinner in branch 'default': |
Hum, an assertion fails in set_reentrant(), helper function to set a TLS (Thread Local Storage), but I'm unable to reproduce the bug on my Fedora 23 (AMD64). [187/400] test_tracemalloc Current thread 0x00007fecefd68700 (most recent call first): |
About memory alignment, _Py_HASHTABLE_ENTRY_DATA_AS_VOID_P() macro uses pointer dereference. It may be replaced with _Py_HASHTABLE_ENTRY_READ_DATA() to use memcpy() and avoid memory alignment issue. |
High-level questions:
(I guess it could be sensible to support disabling domain stuff -- though if I were writing it I probably wouldn't bother just because it increases the number of configurations that need to be tested etc. -- but if you're doing this then IMO it should discard all trace/untrace calls that refer to non-default domains, i.e. domain=False shouldn't mean "discard domain information", it should mean "trace only the heap domain".)
|
Hi,
I simply have no idea at all :-) I guess that the best is to unsigned int is large: at least 32 bits. I understood that we will
In my first implementation, support for domains was always enabled. I added a flag, but I'm not happy with the API. For example, it's not Instead of having to enable the flag early and don't allow to change If fact, converting a table is no more complex than the regular
The design of tracemalloc is more to collect everything, but filter
Having one hash table per domain requires to allocate a new hashtable But I think that we disagree on how traces will used. It looks like I know that technically, the storage doesn't matter much: we can still Domains are quite strange. They are part of the key in the hashtable, |
New changeset 7c894911eb59 by Victor Stinner in branch 'default': New changeset cd9a40a5ea90 by Victor Stinner in branch 'default': |
New changeset b86cdebe0e97 by Victor Stinner in branch 'default': |
I hate working on huge patches, it's a pain to rebase them. I chose to push a first implementation to unblock the issue bpo-26530 "tracemalloc: add C API to manually track/untrack memory allocations". I'm interested by your feedback on the C API. The C implementation can still be changed later. For example, I chose to push a first implementation which *always* store domain_t in traces keys. Later, I will write a patch to switch from the compact key (Py_uintptr_t) to key using domain (pointer_t) dynamically. |
Crap. I expected the set_reentrant() bug to be fixed, but I saw it one more time on the latest build (revision 0da4532a893389bd1ee1ddfe8227750d968023ba): http://buildbot.python.org/all/builders/s390x%20RHEL%203.x/builds/817/steps/test/logs/stdio 0:02:54 [103/400] test_tracemalloc Current thread 0x000003fffcd7b6f0 (most recent call first): |
New changeset 4ec81b497a84 by Victor Stinner in branch 'default': |
New changeset 340ed3ff2656 by Victor Stinner in branch 'default': |
New changeset bbdb24611294 by Victor Stinner in branch 'default': |
New changeset 636fa01842f5 by Victor Stinner in branch 'default': |
Oh, "s390x RHEL 3.x" and "AMD64 Debian root 3.x" buildbots use "-j 1" command line option. In practice, tests are not run in subprocesses, -j1 option is ignored (issue bpo-25285). |
Failure at revision 636fa01842f597bedff7054616c65a4784c92b4a on AMD64 Windows10 3.x (regrtest runs without the -j option, so all tests are run in the same thread). http://buildbot.python.org/all/builders/AMD64%20Windows10%203.x/builds/763/steps/test/logs/stdio Current thread 0x00000a40 (most recent call first): I understand that tracemalloc_start() was called with the first time and that get_reentrant() returns 0 at tracemalloc_start() entry, whereas tracemalloc_init() is supposed to set the reentrant flag to 1. |
New changeset af388201c997 by Victor Stinner in branch 'default': |
Failure on "x86 Ubuntu Shared 3.x" at revision af388201c9976aebc4a8a433c95bfc4a1abe014b. I added an assertion in tracemalloc_init() to ensure that the reeentrant flag is set at the end of the function. The reentrant flag was no more set at tracemalloc_start() entry for an unknown reason. http://buildbot.python.org/all/builders/x86%20Ubuntu%20Shared%203.x/builds/12905/steps/test/logs/stdio Current thread 0x55adfdc0 (most recent call first): |
New changeset f2d64f91d992 by Victor Stinner in branch 'default': New changeset d0b2f70731fb by Victor Stinner in branch 'default': |
New changeset 708beeb65026 by Victor Stinner in branch 'default': |
New changeset c6f30e2731af by Victor Stinner in branch 'default': New changeset af1c1149784a by Victor Stinner in branch '3.5': New changeset 2b4731e22df8 by Victor Stinner in branch '3.5': |
Ooook, I think that I found and fixed the random failures in the change af1c1149784a. |
TODO for this issue: smart storage in _tracemalloc for trace keys. Use compact void* key until the first domain != 0 is used. |
(Ooops, I closed the bug by mistake.) |
New changeset dd887518b569 by Victor Stinner in branch 'default': |
New changeset 0e0a464faffb by Victor Stinner in branch 'default': New changeset 14a552645c30 by Victor Stinner in branch 'default': |
New changeset 922b632808ac by Victor Stinner in branch 'default': |
New changeset 06552275fa30 by Victor Stinner in branch 'default': |
Oooookay, it looks like all issues have been fixed. The optimization for compact keys has been implemented. I didn't see any crash on buildbots recently, test_tracemalloc pass on all platforms. Nathaniel asked how domains will be "allocated" or "registered", but I suggest to discuss that in the issue bpo-26530. I close this issue that was much more difficult to implement than I expected... |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: