New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323
Comments
I added a _PyCoreConfig structure to Python 3.7 which contains almost all parameters used to configure Python. Problems: _PyCoreConfig uses bytes and Unicode strings (char* and wchar_t*) whereas it is also used to setup the memory allocator and (filesystem, locale and stdio) encodings. I propose to add a new _PyPreConfig which is the "strict minimum" configuration to setup encodings and the memory allocator. In practice, it also contains parameters which directly or indirectly impacts the allocator and encodings. For example, isolated impacts use_environment which impacts the allocator (PYTHONMALLOC environment variable). Another example: dev_mode=1 sets the allocator to "debug". The command line arguments are now parsed twice. _PyPreConfig only parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline is used to store command line arguments like -X options. I moved structures closer to where they are used. "Global" _PyMain structure has been removed. _PyCmdline now lives way shorter than previously and is moved from main.c to coreconfig.c. The idea is to better control when and how memory is allocated. In term of API, we get something like:
_PyInitError err = _Py_InitializeFromConfig(&config);
if (_Py_INIT_FAILED(err)) {
_Py_ExitInitError(err);
}
...
Py_Finalize();
return 0; "config.preconfig.stdio_errors" syntax isn't great, but it's simpler to implement than duplicating all _PyPreConfig fields into _PyCoreConfig. |
PR 12087 is a WIP change which implements everything as a single commit. I'm not 100% sure yet that it's best approach for Python initialization, but I'm sure that it solves real interdependencies issues between _PyCoreConfig parameters. IHMO have a "pre-initialization" step to setup the memory allocator and the encodings is a better design. |
TODO: check if _Py_ClearFileSystemEncoding() uses the right memory allocator. _Py_SetFileSystemEncoding() doesn't change temporarily the memory allocator. |
Agreed - I think the biggest thing we learned from the pre-implementation in Python 3.7 is that the "Let's move as much config as we can to Python C API data types" fell down in a couple of areas:
I'll update PEP-432 so it at least mentions some of the lessons learned, and points to the current internal configuration API definitions in the CPython source tree. |
Description of this long serie of changes. I modified Py_Main() and _Py_InitializeCore() to clearly separate "pre-configuration" from "configuration" steps. The pre-configuration now decodes temporarily the command line arguments and uses its own command line parser to get -E, -I and -X options (-X is needed for -X utf8). The pre-configuration is designed to be as small as possible, it configures:
The _PyPreConfig structure has 8 fields:
I had to include fields which have an impact on other fields. Examples:
_PyCoreConfig_Read() is now only called after the memory allocator and the locale (LC_CTYPE locale and UTF-8 mode) are properly configured. I removed the last side effects of _PyCoreConfig_Read(): it no longer modify the locale. Same for the new _PyPreConfig_Read(): zero size effect. The new _PyPreConfig_Write() and _PyCoreConfig_Write() are now responsible to write the new configurations. There are functions to read the configuration from command line arguments:
These functions expect a _PyArgv structure which accepts bytes (wchar*) or Unicode (wchar_t*). I moved coreconfig.h from Include/ to Include/cpython/ to be more explicit that it's excluded from the stable API and that it's CPython specific. I moved all config functions to a new Include/internal/pycore_coreconfig.h. Functions are internal to allow us to modify us anytime until a proper clean public API is designed on top of it. If _PyPreConfig.allocator is set, _PyPreConfig_Write() re-allocate the configuration with the new memory allocator. This tiny thing avoids the new to force a specific memory allocator in many functions. I was able to remove the following code: PyMemAllocatorEx old_alloc;
_PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
...
PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc); Calling Py_Main() after Py_Initialize() is still supported. In this case, it no longer checks the memory allocators name because _PyMem_GetAllocatorsName() returns "pymalloc_debug" (or "malloc_debug" if pymalloc is disabled) after _PyMem_SetupAllocators("debug") is called: names are diffrent. |
I created bpo-36202: "Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake". |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: