msg284229 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-29 06:53 |
Currently there isn't any way to uniquely identify an interpreter. This patch adds a new "id" field to the PyInterpreterState struct. The ID for every new interpreter is set to the value of an increasing global counter. That means that the ID is unique within the process.
IIRC, the availability of unique ID would help tools that make use of subinterpreters, like mod_wsgi. It is also necessary for any effort to expose interpreters in Python-level code (which is the subject of other ongoing work).
The patch also adds:
unsigned long PyInterpreterState_GetID(PyInterpreterState *interp)
Note that, without a Python-level interpreters module, testing this change is limited to extending the existing test code in test_capi.
|
msg284230 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-12-29 07:01 |
Why not use just the pointer to PyInterpreterState itself?
|
msg284232 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-29 07:13 |
Pointers can get re-used, so they aren't temporally unique.
|
msg284233 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-12-29 07:25 |
What is the use case of keeping the uniqueness after deleting an interpreter?
|
msg284273 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-12-29 15:37 |
Tracking purposes mainly, so someone outside the interpreter state can tell when it's no longer there. Making interpreter states weak-referencable would have a similar effect, and could very well use this id if we didn't need the callback.
|
msg284276 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-12-29 16:19 |
If add an API for getting an unique ID of the interpreter state, is it needed to add an API for getting the interpreter state by ID?
|
msg284278 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-29 16:51 |
Three reasons come to mind:
1. threads are identified by small integers
2. long, random-looking IDs are not human-friendly, and subinterpreter IDs will be used like thread IDs are
3. related to what Steve said, temporally unique IDs allow us to be confident about whether or not an interpreter has been destroyed (and how many interpreters there have been)
Since PyInterpreterState is not a PyObject, using weakrefs to address the third point won't work, right?
|
msg284280 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-12-29 16:57 |
There is an issue with integer identifiers of threads. See issue25658 and https://mail.python.org/pipermail/python-ideas/2016-December/043983.html.
|
msg284283 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-12-29 17:47 |
That's an issue with TLS initialisation, not thread IDs. It's easily solved by defining an "uninitialized" value (e.g. 0) and an "invalid" value (e.g. -1).
Interpreter states are in a linked list, so you can traverse the list to find one by ID.
WRT weakrefs, we can't use them directly, but I suspect the higher-level API will need it. Possibly adding a callback on finalisation would fill both needs, but I like having a reliable ID - otherwise we'll probably end up with multiple different IDs managed indirectly via callbacks. (Perhaps a single callback for when any interpreter is finalized that passes the ID through? That should work well, since the ID is designed to outlive the interpreter itself, so it can be an asynchronous notification.)
|
msg284290 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-29 19:01 |
Interpreter states are in a linked list, so you
can traverse the list to find one by ID.
Exactly. At first I had added a PyInterpreterState_FindByID() or something
like that. However, as you noted, I realized it wasn't necessary. :)
WRT weakrefs, we can't use them directly, but I suspect the higher-level
API will need it...
Everything you said about weakrefs sounds good. We can discuss more when
we get to that high-level API.
|
msg284328 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2016-12-30 13:33 |
+1 from me for the general idea.
One subtlety with the draft implementation is that an Initialize/Finalize cycle doesn't reset the counter, which:
1. Increases the chance of counter overflow (while admittedly still leaving it incredibly low)
2. Means you still can't readily check whether the current interpreter is the main interpreter (i.e. the one created automatically in Py_Initialize)
What do you think about resetting the counter back to 1 in Py_Initialize?
|
msg284329 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-12-30 13:42 |
> What do you think about resetting the counter back to 1 in Py_Initialize?
Wouldn't this break the main property of IDs, the uniqueness?
|
msg284332 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2016-12-30 14:20 |
It depends on the scope of uniqueness we're after. `threading._counter()` (which is the small-integer-ID debugging counter for threading.Thread names) is a module global in the threading module, so an Initialize/Finalize cycle will reset it.
If we wanted to track "Which Initialize/Finalize cycle is this?" *as well*, it would make more sense to me to have that as a separate "runtime" counter, such that the full coordinates of the current point of execution were:
- runtime counter (How many times has Py_Initialize been called?)
- interpreter counter (Which interpreter is currently active?)
- thread name (Which thread is currently active?)
I'll also note that in the threading module, the main thread is implicitly thread 0 (but named as MainThread) - Thread-1 is the first thread created via threading.Thread. So it may make sense to use a signed numeric ID, with 0 being the main interpreter, 1 being the first subinterpreter, and negative IDs being errors.
|
msg284335 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-12-30 16:30 |
> Wouldn't this break the main property of IDs, the uniqueness?
If we bump it up to a 64-bit ID then it'll be no worse than how we track all dict mutations.
|
msg284350 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-30 22:43 |
> What do you think about resetting the counter back to 1 in Py_Initialize?
Sounds good to me. When I was working on the patch I had the idea in the back of my mind that not resetting the counter would better support interpreter separation efforts in the future. However, after giving it some thought I don't think that's the case. So resetting it in Py_Initialize() is fine with me.
> I'll also note that in the threading module, the main thread is
> implicitly thread 0 (but named as MainThread) - Thread-1 is the first
> thread created via threading.Thread. So it may make sense to use a
> signed numeric ID, with 0 being the main interpreter, 1 being the first
> subinterpreter, and negative IDs being errors.
I had considered that and went with an unsigned long. 0 is used for errors, and starting at 1, which effectively means the main interpreter is always 1. If we later run into overflow issues then we can sort that out at that point (e.g. by moving to a 64-bit int or even a Python int).
I'll add comments to the patch regarding these points.
|
msg284353 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-30 23:25 |
Here's the updated patch.
|
msg284359 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2016-12-31 03:58 |
The concern I have with using an unsigned value as the interpreter ID is that it's applying the "NULL means an error" idiom or the "false means an error" idiom to a non-pointer and non-boolean return type, whereas the common conventions for integer return values are:
* 0 = success in CLI return codes
* non-negative = success in int-based C APIs
If we were to use int_fast32_t for IDs instead, then any negative value can indicate an error, and the main interpreter could be given ID 0 to better align with the threading.Thread naming scheme.
Whether we hit runtime error at 2 billion subinterpreters or 4 billion subinterpreters in a single process isn't likely to make much difference to anyone, but choosing an idiosyncratic error indicator will impact everyone that attempts to interact with the API.
|
msg284366 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-12-31 05:27 |
I fully expect subinterpreters to have a serious role in long running applications like web servers or other agents (e.g. cluster nodes), so I'd rather just bite the bullet and take 64-bits now so that we can completely neglect reuse issues. Otherwise we'll find ourselves adding infrastructure to hide the fact that you may see the same id twice.
Another four bytes is a cheap way to avoid an entire abstraction layer.
|
msg284374 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2016-12-31 07:31 |
Yeah, I'm also fine with using int_fast64_t for the subinterpreter count.
The only thing I'm really advocating for strongly on that front is that I think it makes sense to sacrifice the sign bit in the ID field as an error indicator that provides a more idiomatic C API.
|
msg284378 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-31 08:11 |
int_fast64_t it is then. :) I vacillated between the options and went with the bigger space. However, you're right that following convention is worth it.
|
msg284380 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-31 09:45 |
I've updated the patch to address Nick's review. Thanks!
|
msg284395 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-12-31 15:09 |
I would prefer to not use "fast" C types because they are not well supported. For example, ctypes has ctypes.c_int64 but no ctypes.c_int_fast64.
Previous work adding an unique identifier: PEP 509
https://www.python.org/dev/peps/pep-0509/#integer-overflow
|
msg284406 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2016-12-31 18:34 |
Thanks for pointing that out, Victor. Given the precedent I switched to using int64_t. The patch actually uses PY_INT64_T, but I didn't see a reason to use int64_t directly. FWIW, there *are* a few places that use int_fast64_t, but they are rather specialized and I didn't want this patch to be a place where I had to deal with setting a more general precedent. :)
|
msg293899 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-17 22:37 |
What the status of this issue Eric? Do you still need interpreter ID?
|
msg294116 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-21 23:39 |
Yes, I still need it. :)
|
msg294212 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-23 02:46 |
New changeset e377416c10eb0bf055b0728cdcdc4488fdfd3b5f by Eric Snow in branch 'master':
bpo-29102: Add a unique ID to PyInterpreterState. (#1639)
https://github.com/python/cpython/commit/e377416c10eb0bf055b0728cdcdc4488fdfd3b5f
|
msg294232 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2017-05-23 06:08 |
This change added a compiler warning.
./Programs/_testembed.c: In function ‘print_subinterp’:
./Programs/_testembed.c:31:22: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘int64_t {aka long long int}’ [-Wformat=]
printf("interp %lu <0x%" PRIXPTR ">, thread state <0x%" PRIXPTR ">: ",
^
|
msg294312 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-24 04:17 |
Thanks for pointing this out, Serhiy. I'll take a look in the morning.
|
msg294315 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-05-24 04:21 |
Does someone know the PRxxx constant for int64_t?
|
msg294370 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-24 18:14 |
Apparently it is PRId64.
|
msg294371 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-24 18:16 |
(see issue30447)
|
msg294417 - (view) |
Author: Eric Snow (eric.snow) *  |
Date: 2017-05-25 00:22 |
I've fixed the compiler warning via d1c3c13fedaf62b71445ccd048e395aa4a7d510f.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:41 | admin | set | github: 73288 |
2017-05-25 00:22:01 | eric.snow | set | status: open -> closed
messages:
+ msg294417 |
2017-05-24 18:16:58 | eric.snow | set | messages:
+ msg294371 |
2017-05-24 18:14:23 | eric.snow | set | messages:
+ msg294370 |
2017-05-24 04:21:17 | vstinner | set | messages:
+ msg294315 |
2017-05-24 04:17:05 | eric.snow | set | messages:
+ msg294312 |
2017-05-23 06:08:58 | serhiy.storchaka | set | status: closed -> open
messages:
+ msg294232 |
2017-05-23 04:08:24 | eric.snow | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2017-05-23 02:46:43 | eric.snow | set | messages:
+ msg294212 |
2017-05-21 23:39:19 | eric.snow | set | messages:
+ msg294116 |
2017-05-17 22:37:58 | vstinner | set | messages:
+ msg293899 |
2017-05-17 21:23:14 | eric.snow | set | pull_requests:
+ pull_request1734 |
2016-12-31 18:34:56 | eric.snow | set | files:
+ interpreter-id-4.diff
messages:
+ msg284406 |
2016-12-31 15:09:58 | vstinner | set | nosy:
+ vstinner messages:
+ msg284395
|
2016-12-31 09:45:23 | eric.snow | set | files:
+ interpreter-id-3.diff
messages:
+ msg284380 |
2016-12-31 08:11:55 | eric.snow | set | messages:
+ msg284378 |
2016-12-31 07:31:29 | ncoghlan | set | messages:
+ msg284374 |
2016-12-31 05:27:25 | steve.dower | set | messages:
+ msg284366 |
2016-12-31 03:58:51 | ncoghlan | set | messages:
+ msg284359 |
2016-12-30 23:25:23 | eric.snow | set | files:
+ interpreter-id-2.diff
messages:
+ msg284353 |
2016-12-30 22:43:16 | eric.snow | set | messages:
+ msg284350 |
2016-12-30 16:30:15 | steve.dower | set | messages:
+ msg284335 |
2016-12-30 14:20:52 | ncoghlan | set | messages:
+ msg284332 |
2016-12-30 13:42:38 | serhiy.storchaka | set | messages:
+ msg284329 |
2016-12-30 13:33:46 | ncoghlan | set | messages:
+ msg284328 |
2016-12-29 19:01:37 | eric.snow | set | messages:
+ msg284290 |
2016-12-29 17:47:54 | steve.dower | set | messages:
+ msg284283 |
2016-12-29 16:57:50 | serhiy.storchaka | set | messages:
+ msg284280 |
2016-12-29 16:51:52 | eric.snow | set | messages:
+ msg284278 |
2016-12-29 16:19:47 | serhiy.storchaka | set | messages:
+ msg284276 |
2016-12-29 15:37:18 | steve.dower | set | messages:
+ msg284273 |
2016-12-29 07:25:43 | serhiy.storchaka | set | messages:
+ msg284233 |
2016-12-29 07:13:43 | eric.snow | set | messages:
+ msg284232 |
2016-12-29 07:01:49 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg284230
|
2016-12-29 06:53:42 | eric.snow | create | |