This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add a new private API clear private variables, which are initialized once, at Python shutdown
Type: resource usage Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: eric.snow, louielu, ncoghlan, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-03-22 16:43 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 770 closed vstinner, 2017-03-22 16:46
PR 780 closed vstinner, 2017-03-23 10:52
Messages (30)
msg289999 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-22 16:43
When I read Serhiy Storshaka's idea in issue #29878, it recalled me an old idea of writing a generalization of the _Py_IDENTIFIER() API.

_Py_IDENTIFIER is an API to initialize a static string variable once. The _Py_Identifier structure has a "next" field to create a single-linked chained list. It allows to clear all variables at exit in _PyUnicode_ClearStaticStrings().

I propose a similar API but for any PyObject* object, to be able to clear all static variables at exit. It should help to release all memory in Py_Finalize() and have a safer Python finalization.

See attached pull request for the API itself.

"Static variables" in C are variables with a limited scope: a single C file or a single function.

It seems like the API can remove some lines of code. Example of patch:

@@ -1452,14 +1450,14 @@ compiler_mod(struct compiler *c, mod_ty mod)
 {
     PyCodeObject *co;
     int addNone = 1;
-    static PyObject *module;
-    if (!module) {
-        module = PyUnicode_InternFromString("<module>");
-        if (!module)
-            return NULL;
+    _Py_STATICVAR(module);
+
+    if (_PY_STATICVAR_INIT(&module, PyUnicode_InternFromString("<module>"))) {
+        return 0;
     }
+
     /* Use 0 for firstlineno initially, will fixup in assemble(). */
-    if (!compiler_enter_scope(c, module, COMPILER_SCOPE_MODULE, mod, 0))
+    if (!compiler_enter_scope(c, module.obj, COMPILER_SCOPE_MODULE, mod, 0))
         return NULL;
     switch (mod->kind) {
     case Module_kind:

--

Drawbacks of the API:

* It adds one pointer per static variables, so increase the memory footprint of 8 bytes per variable
* It requires to write "var.obj" instead of just "var" to access the Python object


The API doesn't try to remove duplicated objects. I consider that it's not an issue since functions like PyLong_FromLong(<small int>) and PyUnicode_InternFromString("c string") already do it for us. Some functions create mutable variables like PyImport_Import() which creates an empty list.

--

Note: Eric Snow proposed a solution "solving multi-core Python":

* https://mail.python.org/pipermail/python-ideas/2015-June/034177.html
* http://ericsnowcurrently.blogspot.fr/2016/09/solving-mutli-core-python.html

I'm not sure if this API would help or not to implement such idea, but Eric's project is experimental and wasn't taken in account when designing the API.
msg290002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-22 17:16
The problem with existing static variables are that they are not properly cleared. When the Python interpreter is finalized and reinitialized they can contain invalid references. This patch fixes this issue.

> * It requires to write "var.obj" instead of just "var" to access the Python object

You can use a dynamic array of PyObject** instead of a linked list for collecting references to "static variables" or use gc_next/gc_refs for managing a linked list.
msg290018 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-22 23:57
The purpose of the issue is to fix memory leaks like issue #21387 (this one is not the best example since it seems like the leak doesn't come from a static variable.)
msg290026 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 06:35
The patch contains an example of using _Py_STATICVAR(). But it doesn't use _PY_STATICVAR_INIT(). _PY_STATICVAR_INIT() can be used if extract the code of getting _array_reconstructor into separate function.

I like the idea in general, but I want to see more examples. Ideally -- the patch should use a new API for *all* static PyObject* variables. This will help to design good API.
msg290034 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-23 10:00
I like this idea in principle, and suspect it may be helpful in the implementation of the new thread-specific-storage API proposed in PEP 539 (also see http://bugs.python.org/issue25658 ).

How would you feel about calling it _Py_ONCE_VAR? The use case here is quite similar to the concepts behind the pthread_once API (just with the lifecycle tied to Py_Initialize/Py_Finalize rather than the C level process), and it's the "initialise-on-first-use" behaviour that's significant here, moreso than the fact that the typical storage target is a static variable.

As far as the linked list goes, the nice aspect of Victor's approach is that it doesn't need to do any additional runtime memory allocations - all the storage involved is still statically allocated in the modules that initialise the values, there are just some extra  pointer assignments to link everything together (with the GIL protecting against race conditions).

However, the user visible ".obj" indirection could still be avoided at the cost of an additional pointer per entry:

    typedef struct _Py_OnceVar {
        struct _Py_OnceVar *next;
        PyObject **obj_ref;
    } _Py_OnceVar;

    #define _Py_ONCE_VAR(var_decl, var) \
      var_decl var = NULL;
      static _Py_OnceVar var ## _once_meta = {.next = NULL, .obj_ref = (PyObject **) &var}

Intended declaration:

    _Py_ONCE_VAR(static PyObject *, array_reconstructor);

_Py_ONCE_VAR_INIT would similarly be adjusted to assign the supplied value to "var", while also setting "var_once_meta.next" to hook the value into the linked list.
msg290036 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 10:20
Serhiy: "The patch contains an example of using _Py_STATICVAR(). But it doesn't use _PY_STATICVAR_INIT()."

I'm not sure that I understand your comment. All modified code use _Py_STATICVAR() to declare the declare. All modified code use _PY_STATICVAR_INIT() to initialize the variable, except of array_array___reduce_ex__() which uses explicitly _PyStaticVar_Set() beause its initialization code is more complex.

_PyStaticVar_Set() is the low-level function, I would prefer to avoid it since it fails with an assertion error if it's called twice. _PY_STATICVAR_INIT() helper should be perfer since it makes the code shorter, but it's similar to simple expressions like PyUnicode_FromString("\n").

I tried to include examples of usage of both APIs to give you an idea of the API.

I chose to *not* patch the whole Python code base, because I would like to first get a review of a the API. It seems like alternatives have been proposed ;-)


> I like the idea in general, but I want to see more examples.

Which kind of other examples do you need?
msg290037 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 10:22
#define _Py_ONCE_VAR(var_decl, var) \
      var_decl var = NULL;
      static _Py_OnceVar var ## _once_meta = {.next = NULL, .obj_ref = (PyObject **) &var}

Yeah, I had a similar idea, but I fear that it will increase the usage of the C stack memory.

See the issue #28858: my old _PyObject_CallArg1() macro allocated an implicit array on the stack and increased the stack usage, whereas the purpose of the macro was to *reduce* the stack usage (just the opposite!). So I removed the macro.
msg290038 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 10:27
Serhiy: "use gc_next/gc_refs for managing a linked list"

I'm not sure that I understand your idea. These fields are already used internally by the garbage collector. If I modify one of these fields, it would corrupt the GC, no?

Serhiy: "You can use a dynamic array of PyObject** instead of a linked list for collecting references to "static variables""

Ah yes, I like the idea of using a single array to track all variables. An array reduces memory fragmentation and is simple to maintain, since the API only needs two operations: list.append(obj) and list.clear(). The API already requires to check for errors, so another memory allocation failure wouldn't be suprising.
msg290039 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 10:49
My apologies Victor. It seems that the browser on my netbook shown only the part of changes and I seen only the change in array_array___reduce_ex__().

I like Nick's idea for hiding the indirection, but passing var_decl doesn't look having much sense to me. It should always be "static PyObject *".

I don't have a preference for the name, but the purpose of a new API is not just to ensure that a piece of initialization code is executed at most once, but that the object will be destroyed and the reference will be cleared after finalizing the interpreter. Seems this is different from the purpose of pthread_once().
msg290040 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 10:54
I wrote a new different API: https://github.com/python/cpython/pull/780

New C API for variables only initialized once to be able to clear
them at exit:

    New macro _Py_ONCEVAR(var) to declare a variable
    New macro _PY_ONCEVAR_INIT(var, expr) to initialize a variable
    once
    New function _PyOnceVar_Set() to explicitly set a variable once
    to initialize it
    New _PyOnceVar_Fini() function clearing all variables (initialized
    once) at exit

Variables keep their PyObject* type, but the API hides an internal C array tracking all Python objects to Py_DECREF them at exit in _PyOnceVar_Fini().

I used Nick's naming scheme since it seems like pthread has an existing API which is similar, but different (my API doesn't require an initialization callback).

I really prefer the second API using PyObject*.
msg290041 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 11:00
> "use gc_next/gc_refs for managing a linked list"

Forgot about this idea. Not all objects have the GC header.

But if they have it they could be removed from the GC list and added in the separate list since in any case the garbage collector shouldn't destroy objects referenced by static variables.
msg290043 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 11:33
The new API rather combines pthread_once() and pthread_cleanup_push().
msg290045 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-23 12:48
Passing var_decl was based on not knowing the answer to two questions:

* do we ever declare non-statics this way? (OTOH, if we do, this could be an opportunity to hide them behind read-only accessor functions)
* do we ever declare more specific types than PyObject * this way? (OTOH, if that's the uncommon case, requiring a cast to the more specific type probably wouldn't hurt)

As far as stack usage goes, all _Py_OnceVar instances should be in the data segment when using the linked list approach, so they shouldn't impact stack usage, and doing so retains the benefit of avoiding dynamic memory allocation just to track the static variable declarations. Since the macro is dereferencing the address of a static variable in the initialiser of another static variable, that shouldn't require any runtime stack space either. OTOH, I haven't actually ever tried compiling a macro like that, so it's entirely possible compilers won't like it.
msg290046 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 13:07
> * do we ever declare non-statics this way? (OTOH, if we do, this could be an opportunity to hide them behind read-only accessor functions)
> * do we ever declare more specific types than PyObject * this way? (OTOH, if that's the uncommon case, requiring a cast to the more specific type probably wouldn't hurt)

Well, with my 2nd API, I'm not sure that the macro to declare a variable is very simple:

+/* Declare a static PyObject* variable which is only initialized once.
+   _PyOnceVar_Fini() will clear the variable at Python finalization. */
+#define _Py_ONCEVAR(var) \
+    static PyObject* var = NULL

Technically, the variable doesn't have to be static. But do we want to use this API for global variables initialized once? It would increase the memory usage, whereas currently we have specialized code like _PyUnicode_Fini() which clears its unicode_empty singleton. I don't want to touch this code. At least, not yet.

If you want to support other types than PyObject*, _PY_ONCEVAR_INIT() macro can cast the first argument to PyObject*.

I would prefer to start with something simpler, and discuss case by case for other variables.
msg290047 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 13:07
About the pull request: to be clear, I know that some modified local variables should use _Py_IDENTIFIER() rather than _Py_ONCEVAR(), but I chose to use _Py_ONCEVAR() just to give examples of the API.
msg290048 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 13:56
I think we can get rid of _Py_ONCEVAR(). Just keep current declarations. _PY_ONCEVAR_INIT() can work even with non-static global variables. It could work even with non-PyObject pointers.
msg290049 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 14:51
3rd round of the API:

    static PyObject *assertion_error = NULL;
    ...
    if (_PY_ONCEVAR_INIT(assertion_error,
                         PyUnicode_InternFromString("AssertionError"))) {
        return 0;
    }
    ...

(Not the best example, _Py_IDENTIFIER() would be more appropriate here ;-))

--

I just added a second commit to remove the next field from _Py_Identifier and reuse _PyOnceVar_Set() in _PyUnicode_FromId().
msg290052 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 16:30
About _PY_ONCEVAR_INIT() vs _Py_IDENTIFIER.

Using _Py_IDENTIFIER works well if you have an API accepting directly a _Py_IDENTIFIER*. If you call functions requesting a PyObject*, you need to call _PyUnicode_FromId() and test for failure. If you start by calling _PyUnicode_FromId() when the object is not initialized yet, above you have to use var.object, whereas previously Serhiy and Nick weren't confortable with this specific case.

I prefer to use _PY_ONCEVAR_INIT() to keep a regular PyObject* variable and makes the code simpler.
msg290056 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-03-23 17:49
I modified more modules to use the new API. I'm not sure that using the API in modules is safe.

It's safe to use the API in functions which begins with trying to initialize the variable. In such case, if the variable is cleared, calling the function later initialize again the variable and it's fine.

When for module variables only initialized when the module is initialized, there is a risk that a module function tries to access a variable which has been cleared previously during Python shutdown.

See for example changes in the _datetime module.
msg290058 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-23 18:36
Some code (_PyUnicode_FromId, bool_repr, create_filter) could be simpler if make _PY_ONCEVAR_INIT returning the value (NULL in case of error).

The signature of _PY_ONCEVAR_INIT() is the same as of _Py_SETREF(). If var == NULL both `_PY_ONCEVAR_INIT(var, expr)` and `_Py_SETREF(var, expr)` are equivalent to just `var = expr`. Maybe rename _PY_ONCEVAR_INIT to _Py_SET_ONCE?
msg290078 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-24 05:26
The main potential benefit I see to keeping the special variable declaration is that it may help avoid a certain category of error: calling _Py_ONCE_VAR_INIT on a stack local pointer reference (which would leave the global array with a reference to nonsense). While we don't care if the once_vars are static or not, we do care that they're not allocated on the call stack, as otherwise they won't be around for Py_Finalize() to clean up.

On the other hand, _Py_SET_ONCE is nice and easy to explain "it's similar to _Py_SETREF, but: 1) doesn't do anything if the reference is already set; and 2) registers the reference to be cleaned up in Py_Finalize"

Also interesting is the fact that you can still use _Py_SETREF to change a reference that was initialized with _Py_SET_ONCE without breaking anything. From that point of view, a better name might be _Py_SET_FINALIZED, emphasising the fact that it registers the pointer for finalization over the fact that it's a no-op when run on an already set pointer.
msg290079 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-24 05:33
For module level variables, I'd expect this API to need to be used in tandem with switching the module over to PEP 489 multi-phase initialization.

By calling the new API in their _Py_mod_exec slot implementation, extension modules should be able to handle multiple initialize/finalize cycles correctly: https://www.python.org/dev/peps/pep-0489/#subinterpreters-and-interpreter-reloading
msg290869 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-30 18:26
Ping. My suggestion is to rename _PY_ONCEVAR_INIT to _Py_SET_ONCE and make it returning the value. And apply this to more static and global variables.
msg290891 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-31 04:04
I'm +1 for _Py_SET_FINALIZED with the semantics Serhiy suggests, and +0 for _Py_SET_ONCE.

My rationale for that preference is that _Py_SET_ONCE is a misnomer in the presence of

* multiple Py_Initialize/Py_Finalize cycles; and/or
* _Py_SETREF calls that replace the value to be finalized

 while _Py_SET_FINALIZED remains accurate in both cases.
msg290893 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-31 05:31
I agree with your rationale Nick, but _Py_SET_FINALIZED is a little too long and highlight only the second function of the macro. I don't have good name.
msg290897 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-03-31 06:19
_PY_SET_FINALIZED is only one letter longer than _PY_ONCEVAR_INIT and the same length as the originally proposed _PY_STATICVAR_INIT.

Since it shouldn't be anywhere near as common as _Py_SETREF, I'm also okay in general with trading a bit of brevity for accuracy.

However, I don't want to trade away *too* much brevity, so I think we're going to have to choose between reminding readers of "this is finalized in Py_Finalize" or "this is a no-op if the target is already set", and leave the other aspect implicit.

In addition to the correctness argument above, my rationale for now favouring the former is that the logical leap in "this is finalized in Py_Finalize, so the assigned expression will only be executed once per Py_Initialize call" seems more reasonable to me than the one in "the assigned expression is only executed once per Py_Initialize call, so it will be finalized in Py_Finalize".
msg291430 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-10 17:07
Can you move forward this issue Victor? I'm very interesting in it. It is worth to advertise this API on Python-Dev.

_PyArg_Parser also can utilize this API.
msg291542 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-04-12 10:10
I dislike _Py_SET_FINALIZED and _Py_SET_ONCE name.

I prefer to mention an "initialization" because it's a common pattern in programming and it is known that it only requires to be done once.

_PY_ONCEVAR_INIT() macro is unusual: it only evaluate the second parameter parameter, a C expression, if the variable is not initialized yet.

Py_SETREF() is different because it always evaluates its second argument, and in most cases, the second argument is an existing variable and not an expression.
msg291545 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-04-12 10:42
What if use the term "singleton"? It includes "initialize once" and "finalize at shutdown" meanings. _Py_INIT_SINGLETON() or _Py_SET_SINGLETON()?
msg301377 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-05 21:17
My PR 780 doesn't support multiple Python interpreters in the same process. To support that, we need a more advanced solution. For example, add an hash table in the interpreter structure using the variable memory address as the key.

Maybe we can move step by step and start with the PR 780, and only later support multiple interpreters. PR 780 enhances the most common case, Python running a single interpreter.
History
Date User Action Args
2022-04-11 14:58:44adminsetgithub: 74067
2018-09-19 23:11:03vstinnersetstatus: open -> closed
resolution: out of date
stage: resolved
2017-09-05 21:17:14vstinnersetmessages: + msg301377
2017-04-12 10:42:12serhiy.storchakasetmessages: + msg291545
2017-04-12 10:10:17vstinnersetmessages: + msg291542
2017-04-11 08:38:56louielusetnosy: + louielu
2017-04-10 17:07:50serhiy.storchakasetmessages: + msg291430
2017-03-31 06:19:44ncoghlansetmessages: + msg290897
2017-03-31 05:31:20serhiy.storchakasetmessages: + msg290893
2017-03-31 04:04:04ncoghlansetmessages: + msg290891
2017-03-30 18:26:28serhiy.storchakasetmessages: + msg290869
2017-03-24 05:33:25ncoghlansetmessages: + msg290079
2017-03-24 05:26:32ncoghlansetmessages: + msg290078
2017-03-23 18:36:58serhiy.storchakasetmessages: + msg290058
2017-03-23 17:49:34vstinnersetmessages: + msg290056
2017-03-23 17:46:46vstinnersettitle: Add a new private API for "static C variables" (_PyStaticVar) to clear them at exit -> Add a new private API clear private variables, which are initialized once, at Python shutdown
2017-03-23 16:30:26vstinnersetmessages: + msg290052
2017-03-23 14:51:11vstinnersetmessages: + msg290049
2017-03-23 13:56:45serhiy.storchakasetmessages: + msg290048
2017-03-23 13:07:57vstinnersetmessages: + msg290047
2017-03-23 13:07:11vstinnersetmessages: + msg290046
2017-03-23 12:48:00ncoghlansetmessages: + msg290045
2017-03-23 11:33:57serhiy.storchakasetmessages: + msg290043
2017-03-23 11:00:40serhiy.storchakasetmessages: + msg290041
2017-03-23 10:54:42vstinnersetmessages: + msg290040
2017-03-23 10:52:32vstinnersetpull_requests: + pull_request684
2017-03-23 10:49:41serhiy.storchakasetmessages: + msg290039
2017-03-23 10:27:16vstinnersetmessages: + msg290038
2017-03-23 10:22:01vstinnersetmessages: + msg290037
2017-03-23 10:20:02vstinnersetmessages: + msg290036
2017-03-23 10:00:57ncoghlansetmessages: + msg290034
2017-03-23 06:35:04serhiy.storchakasetmessages: + msg290026
2017-03-22 23:58:01vstinnersetnosy: + ncoghlan
2017-03-22 23:57:34vstinnersetmessages: + msg290018
2017-03-22 17:16:49serhiy.storchakasetmessages: + msg290002
2017-03-22 16:47:00vstinnersetnosy: + eric.snow, serhiy.storchaka
2017-03-22 16:46:41vstinnersetpull_requests: + pull_request677
2017-03-22 16:43:29vstinnercreate