Issue 33607: [subinterpreters] Explicitly track object ownership (and allocator).

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/77788

classification

Title:	[subinterpreters] Explicitly track object ownership (and allocator).
Type:		Stage:
Components:	Subinterpreters	Versions:	Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	emilyemorehouse, eric.snow, ncoghlan, pitrou, vstinner
Priority:	normal	Keywords:

Created on 2018-05-22 19:19 by eric.snow, last changed 2022-04-11 14:59 by admin.

Messages (6)
msg317330 - (view)	Author: Eric Snow (eric.snow) *	Date: 2018-05-22 19:19
When an object is created it happens relative to the current thread (ergo interpreter) and the current allocator (part of global state). We do not track either of these details for the object. It may make sense to start doing so (reasons next). Regarding tracking the interpreter, that originating interpreter can be thought of as the owner. Any lifecycle operations should happen relative to that interpreter. Furthermore, the object should be used in C-API calls only in that interpreter (i.e. when the current thread's Py_ThreadState belongs to that interpreter). This hasn't been an issue since currently all interpreters in the process share the GIL, as well as the fact that subinterpreters haven't been heavily used historically. However, the possibility of no longer sharing the GIL suggests that tracking the owning interpreter (and perhaps even other "sharing" interpreters) would be important. Furthermore, in the last few years subinterpreters have seen increasing usage (see Openstack Ceph), and knowing the originating interpreter for an object can be useful there. Regardless, even in the single interpreter case knowing the owning interpreter is important during runtime finalization (which is currently slightly broken), which impacts CPython embedders. Regarding the allocator, there used to be just a single global one that the runtime used from start to finish. Now the C-API offers a way to switch the allocator, so there's no guarantee that the right allocator is used in PyMem_Free(). This has already had a negative impact on efforts to clean up CPython's runtime initialization. It also results in problems during finalization. Additionally, we are looking into moving the allocator from the global runtime state to the per-interpreter (or even per-thread or per-context) state value. In that world it would be essential to know which allocator was used when creating the object. There are other possible applications based on knowing an object's allocator, but I'll stop there. To sort all this out we would need to track per-object: * originating allocator (pointer or id) * owning interpreter (pointer or id) * (possibly) "sharing" interpreters (linked list?) Either we'd add 2 pointer-size fields to PyObject or we would keep a separate hash table (or two) pointing from each object to the info (similar to how we've considered doing for refcounts). To alleviate impact on the common case (not embedded, single interpreter, same allocator), we could default to not tracking interpreter/allocator and take a lookup failure to mean "main interpreter, default allocator".
msg317339 - (view)	Author: STINNER Victor (vstinner) *	Date: 2018-05-22 20:18
"Either we'd add 2 pointer-size fields to PyObject or we would keep a separate hash table (or two) pointing from each object to the info (...)" The expect a huge impact on the memory footprint. I dislike the idea. Currently, the smallest Python object is: >>> sys.getsizeof(object()) 16 It's just two pointers. Adding two additional pointers would simply double the size of the object. "Now the C-API offers a way to switch the allocator, so there's no guarantee that the right allocator is used in PyMem_Free()." I would expect that either all interpreters use the same memory allocator, or that each interpreter uses its own allocator. If you use one allocator per interpreter, calling PyMem_Free() from the wrong interpreter would just crash. As you get a crash when you call free() on an object allocated by PyMem_Free(). You can extend PYTHONMALLOC=debug to detect bugs. This builtin debugger is already able to catch misuses of allocators. Adding extra pointers to this debugger is acceptable since it doesn't modify the footprint of the default mode.
msg317403 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2018-05-23 13:13
Rather than tracking this per object, you could potentially track it per arena at the memory allocator level instead. Then if you really need the info (e.g. when running the debug allocator), you can check it in a reliable way, but in the normal case, you assume the associations are being managed correctly and avoid any significant bookkeeping overhead.
msg318037 - (view)	Author: Eric Snow (eric.snow) *	Date: 2018-05-29 14:22
Note that I wouldn't call this issue absolutely specific to subinterpreters. The "ownership" part is, but tracking the allocator has practical application under a single interpreter. I suppose I could split this issue apart. I lumped the two together because I expected the solution would be the same for both. However, that's not necessarily the case. Would it help to open a separate issue for tracking the allocator?
msg318091 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2018-05-29 20:55
I agree with Victor, we shouldn't add PyObject fields that only have use in certain (minority) situations. The idea of tracking per arena will be non-trivial to implement, as only small objects (smaller than 512 bytes) use our own allocator; larger objects go to the system allocator. Can I ask why you're considering this? I thought you didn't want to transfer ownership between interpreters.
msg368913 - (view)	Author: STINNER Victor (vstinner) *	Date: 2020-05-15 01:51
I see two options: * Add a field to PyObject, but only in a special debug mode. Maybe not even in Py_DEBUG (since I managed to make Py_DEBUG ABI-compatible with the release mode!) * Add an hash table mapping an object to its interpreter. The hash table would only be used in debug mode. It may even be turned on at runtime depending on a command line option or something else. See also bpo-40514: [subinterpreters] Add --experimental-isolated-subinterpreters build option. Antoine: "Can I ask why you're considering this? I thought you didn't want to transfer ownership between interpreters." I guess that the purpose is to ensure that: detect when an object is shared between two interpreters. Currently, tons of objects are still shared between interpreters. Starting with static types: see bpo-40601 "[C API] Hide static types from the limited C API".

History
Date	User	Action	Args
2022-04-11 14:59:00	admin	set	github: 77788
2020-05-15 01:51:49	vstinner	set	messages: + msg368913
2020-05-15 00:42:26	vstinner	set	components: + Subinterpreters
2018-06-22 22:48:02	eric.snow	set	nosy: + emilyemorehouse
2018-05-29 20:55:34	pitrou	set	nosy: + pitrou messages: + msg318091
2018-05-29 14:22:13	eric.snow	set	messages: + msg318037
2018-05-23 13:13:52	ncoghlan	set	messages: + msg317403
2018-05-22 20:57:47	vstinner	set	title: Explicitly track object ownership (and allocator). -> [subinterpreters] Explicitly track object ownership (and allocator).
2018-05-22 20:18:53	vstinner	set	messages: + msg317339
2018-05-22 19:19:38	eric.snow	create