Message 317330 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	eric.snow
Recipients	eric.snow, ncoghlan, vstinner
Date	2018-05-22.19:19:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1527016778.74.0.682650639539.issue33607@psf.upfronthosting.co.za>
In-reply-to

Content
When an object is created it happens relative to the current thread (ergo interpreter) and the current allocator (part of global state). We do not track either of these details for the object. It may make sense to start doing so (reasons next). Regarding tracking the interpreter, that originating interpreter can be thought of as the owner. Any lifecycle operations should happen relative to that interpreter. Furthermore, the object should be used in C-API calls only in that interpreter (i.e. when the current thread's Py_ThreadState belongs to that interpreter). This hasn't been an issue since currently all interpreters in the process share the GIL, as well as the fact that subinterpreters haven't been heavily used historically. However, the possibility of no longer sharing the GIL suggests that tracking the owning interpreter (and perhaps even other "sharing" interpreters) would be important. Furthermore, in the last few years subinterpreters have seen increasing usage (see Openstack Ceph), and knowing the originating interpreter for an object can be useful there. Regardless, even in the single interpreter case knowing the owning interpreter is important during runtime finalization (which is currently slightly broken), which impacts CPython embedders. Regarding the allocator, there used to be just a single global one that the runtime used from start to finish. Now the C-API offers a way to switch the allocator, so there's no guarantee that the right allocator is used in PyMem_Free(). This has already had a negative impact on efforts to clean up CPython's runtime initialization. It also results in problems during finalization. Additionally, we are looking into moving the allocator from the global runtime state to the per-interpreter (or even per-thread or per-context) state value. In that world it would be essential to know which allocator was used when creating the object. There are other possible applications based on knowing an object's allocator, but I'll stop there. To sort all this out we would need to track per-object: * originating allocator (pointer or id) * owning interpreter (pointer or id) * (possibly) "sharing" interpreters (linked list?) Either we'd add 2 pointer-size fields to PyObject or we would keep a separate hash table (or two) pointing from each object to the info (similar to how we've considered doing for refcounts). To alleviate impact on the common case (not embedded, single interpreter, same allocator), we could default to not tracking interpreter/allocator and take a lookup failure to mean "main interpreter, default allocator".

When an object is created it happens relative to the current
thread (ergo interpreter) and the current allocator (part of
global state).  We do not track either of these details for
the object.  It may make sense to start doing so (reasons next).

Regarding tracking the interpreter, that originating interpreter
can be thought of as the owner.  Any lifecycle operations should
happen relative to that interpreter.  Furthermore, the object
should be used in C-API calls only in that interpreter (i.e.
when the current thread's Py_ThreadState belongs to that
interpreter).  This hasn't been an issue since currently all
interpreters in the process share the GIL, as well as the fact
that subinterpreters haven't been heavily used historically.
However, the possibility of no longer sharing the GIL suggests
that tracking the owning interpreter (and perhaps even other
"sharing" interpreters) would be important.  Furthermore,
in the last few years subinterpreters have seen increasing usage
(see Openstack Ceph), and knowing the originating interpreter
for an object can be useful there.  Regardless, even in the
single interpreter case knowing the owning interpreter is
important during runtime finalization (which is currently
slightly broken), which impacts CPython embedders.

Regarding the allocator, there used to be just a single global
one that the runtime used from start to finish.  Now the C-API
offers a way to switch the allocator, so there's no guarantee
that the right allocator is used in PyMem_Free().  This has
already had a negative impact on efforts to clean up CPython's
runtime initialization.  It also results in problems during
finalization.  Additionally, we are looking into moving the
allocator from the global runtime state to the per-interpreter
(or even per-thread or per-context) state value.  In that world
it would be essential to know which allocator was used when
creating the object.  There are other possible applications
based on knowing an object's allocator, but I'll stop there.

To sort all this out we would need to track per-object:

* originating allocator (pointer or id)
* owning interpreter (pointer or id)
* (possibly) "sharing" interpreters (linked list?)

Either we'd add 2 pointer-size fields to PyObject or we would
keep a separate hash table (or two) pointing from each object
to the info (similar to how we've considered doing for
refcounts).  To alleviate impact on the common case (not
embedded, single interpreter, same allocator), we could default
to not tracking interpreter/allocator and take a lookup failure
to mean "main interpreter, default allocator".

History
Date	User	Action	Args
2018-05-22 19:19:38	eric.snow	set	recipients: + eric.snow, ncoghlan, vstinner
2018-05-22 19:19:38	eric.snow	set	messageid: <1527016778.74.0.682650639539.issue33607@psf.upfronthosting.co.za>
2018-05-22 19:19:38	eric.snow	link	issue33607 messages
2018-05-22 19:19:38	eric.snow	create