Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

vstinner · 2019-02-28T01:42:19Z

BPO	36142
Nosy	@ncoghlan, @vstinner
PRs	[WIP] bpo-36142: Rework Python init to add _PyPreConfig #12087 bpo-36142: Exclude coreconfig.h from Py_LIMITED_API #12111 bpo-36142: Rework error reporting in pymain_main() #12113 bpo-36142: Move command line parsing to coreconfig.c #12123 bpo-36142: Remove _PyMain structure #12120 bpo-36142: Add preconfig.c #12128 bpo-36142: Add _PyPreConfig structure #12172 bpo-36142: Add _PyPreConfig_ReadFromArgv() #12173 bpo-36142: Add _PyPreConfig.utf8_mode #12174 bpo-36142: Add _PyPreConfig.allocator #12181 bpo-36142: Add _PyMem_GetDebugAllocatorsName() #12185 bpo-36142: _PyPreConfig_Write() sets the allocator #12186 bpo-36142: Add _PyPreConfig_SetAllocator() #12187 bpo-36142: _PyPreConfig_Read() sets LC_CTYPE #12188 bpo-36142: PYTHONMALLOC overrides PYTHONDEV #12191

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2019-03-06.00:55:55.754>
created_at = <Date 2019-02-28.01:42:19.448>
labels = ['interpreter-core', '3.8']
title = 'Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings'
updated_at = <Date 2019-03-06.11:51:55.470>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2019-03-06.11:51:55.470>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2019-03-06.00:55:55.754>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2019-02-28.01:42:19.448>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 36142
keywords = ['patch']
message_count = 22.0
messages = ['336791', '336792', '336887', '336918', '336921', '336924', '336931', '336946', '336989', '337024', '337161', '337164', '337182', '337225', '337241', '337244', '337246', '337249', '337250', '337253', '337254', '337295']
nosy_count = 2.0
nosy_names = ['ncoghlan', 'vstinner']
pr_nums = ['12087', '12111', '12113', '12123', '12120', '12128', '12172', '12173', '12174', '12181', '12185', '12186', '12187', '12188', '12191']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue36142'
versions = ['Python 3.8']

vstinner · 2019-02-28T01:42:19Z

I added a _PyCoreConfig structure to Python 3.7 which contains almost all parameters used to configure Python. Problems: _PyCoreConfig uses bytes and Unicode strings (char* and wchar_t*) whereas it is also used to setup the memory allocator and (filesystem, locale and stdio) encodings.

I propose to add a new _PyPreConfig which is the "strict minimum" configuration to setup encodings and the memory allocator. In practice, it also contains parameters which directly or indirectly impacts the allocator and encodings. For example, isolated impacts use_environment which impacts the allocator (PYTHONMALLOC environment variable). Another example: dev_mode=1 sets the allocator to "debug".

The command line arguments are now parsed twice. _PyPreConfig only parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline is used to store command line arguments like -X options.

I moved structures closer to where they are used. "Global" _PyMain structure has been removed. _PyCmdline now lives way shorter than previously and is moved from main.c to coreconfig.c. The idea is to better control when and how memory is allocated.

In term of API, we get something like:

_PyCoreConfig config = _PyCoreConfig_INIT;
config.preconfig.stdio_encoding = "iso8859-1";
config.preconfig.stdio_errors = "replace";
config.user_site_directory = 0;
...

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }
    ...
    Py_Finalize();
    return 0;

"config.preconfig.stdio_errors" syntax isn't great, but it's simpler to implement than duplicating all _PyPreConfig fields into _PyCoreConfig.

vstinner · 2019-02-28T01:50:07Z

PR 12087 is a WIP change which implements everything as a single commit.

I'm not 100% sure yet that it's best approach for Python initialization, but I'm sure that it solves real interdependencies issues between _PyCoreConfig parameters. IHMO have a "pre-initialization" step to setup the memory allocator and the encodings is a better design.

vstinner · 2019-03-01T02:44:16Z

New changeset f684d83 by Victor Stinner in branch 'master':
bpo-36142: Exclude coreconfig.h from Py_LIMITED_API (GH-12111)
f684d83

vstinner · 2019-03-01T14:27:05Z

TODO: check if _Py_ClearFileSystemEncoding() uses the right memory allocator. _Py_SetFileSystemEncoding() doesn't change temporarily the memory allocator.

vstinner · 2019-03-01T14:53:11Z

New changeset dfe8847 by Victor Stinner in branch 'master':
bpo-36142: Rework error reporting in pymain_main() (GH-12113)
dfe8847

vstinner · 2019-03-01T15:25:22Z

New changeset 95e2cbf by Victor Stinner in branch 'master':
bpo-36142: Move command line parsing to coreconfig.c (GH-12123)
95e2cbf

vstinner · 2019-03-01T16:53:00Z

New changeset 91b9ecf by Victor Stinner in branch 'master':
bpo-36142: Add preconfig.c (GH-12128)
91b9ecf

vstinner · 2019-03-01T18:32:15Z

New changeset 62be763 by Victor Stinner in branch 'master':
bpo-36142: Remove _PyMain structure (GH-12120)
62be763

ncoghlan · 2019-03-02T04:41:43Z

Agreed - I think the biggest thing we learned from the pre-implementation in Python 3.7 is that the "Let's move as much config as we can to Python C API data types" fell down in a couple of areas:

The embedding application is likely to speak char* and/or wchar_* natively, not PyObject*, and this applies even for CPython's own current Py_Main implementation.
There's some core system libc interaction scaffolding that we need in place first, giving 3 phases, not two:

initialise anything needed to read configuration settings from the environment and command line (i.e. memory allocators, interface encodings)
initialise the things needed to execute builtin and frozen Python modules (core data types, random hash seed, etc)
initialise the things needed to execute external Python modules (sys.path, etc)

I'll update PEP-432 so it at least mentions some of the lessons learned, and points to the current internal configuration API definitions in the CPython source tree.

ncoghlan · 2019-03-03T02:17:46Z

PEP-432 tweaked: https://github.com/python/peps/pull/904/files

vstinner · 2019-03-05T01:01:30Z

New changeset cad1f74 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig structure (GH-12172)
cad1f74

vstinner · 2019-03-05T01:44:14Z

New changeset 6dcb542 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig_ReadFromArgv() (GH-12173)
6dcb542

vstinner · 2019-03-05T11:32:14Z

New changeset 5a02e0d by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig.utf8_mode (GH-12174)
5a02e0d

vstinner · 2019-03-05T16:37:55Z

New changeset b35be4b by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig.allocator (GH-12181)
b35be4b

vstinner · 2019-03-05T22:31:58Z

New changeset a9df651 by Victor Stinner in branch 'master':
bpo-36142: Add _PyMem_GetDebugAllocatorsName() (GH-12185)
a9df651

vstinner · 2019-03-05T23:37:00Z

New changeset 7d2ef3e by Victor Stinner in branch 'master':
bpo-36142: _PyPreConfig_Write() sets the allocator (GH-12186)
7d2ef3e

vstinner · 2019-03-06T00:13:49Z

New changeset c656e25 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig_SetAllocator() (GH-12187)
c656e25

vstinner · 2019-03-06T00:44:36Z

New changeset 4fffd38 by Victor Stinner in branch 'master':
bpo-36142: _PyPreConfig_Read() sets LC_CTYPE (GH-12188)
4fffd38

vstinner · 2019-03-06T00:45:15Z

Description of this long serie of changes.

I modified Py_Main() and _Py_InitializeCore() to clearly separate "pre-configuration" from "configuration" steps. The pre-configuration now decodes temporarily the command line arguments and uses its own command line parser to get -E, -I and -X options (-X is needed for -X utf8). The pre-configuration is designed to be as small as possible, it configures:

memory allocators
LC_CTYPE locale and set the UTF-8 mode

The _PyPreConfig structure has 8 fields:

allocator
coerce_c_locale
coerce_c_locale_warn
dev_mode
isolated
(Windows only) legacy_windows_fs_encoding
use_environment
utf8_mode

I had to include fields which have an impact on other fields. Examples:

dev_mode=1 sets allocator to "default";
isolated=1 sets use_environment to 0;
legacy_windows_fs_encoding=& sets utf8_mode to 0.

_PyCoreConfig_Read() is now only called after the memory allocator and the locale (LC_CTYPE locale and UTF-8 mode) are properly configured.

I removed the last side effects of _PyCoreConfig_Read(): it no longer modify the locale. Same for the new _PyPreConfig_Read(): zero size effect.

The new _PyPreConfig_Write() and _PyCoreConfig_Write() are now responsible to write the new configurations.

There are functions to read the configuration from command line arguments:

_PyPreConfig_ReadFromArgv()
_PyCoreConfig_ReadFromArgv()

These functions expect a _PyArgv structure which accepts bytes (wchar*) or Unicode (wchar_t*).

I moved coreconfig.h from Include/ to Include/cpython/ to be more explicit that it's excluded from the stable API and that it's CPython specific.

I moved all config functions to a new Include/internal/pycore_coreconfig.h. Functions are internal to allow us to modify us anytime until a proper clean public API is designed on top of it.

If _PyPreConfig.allocator is set, _PyPreConfig_Write() re-allocate the configuration with the new memory allocator. This tiny thing avoids the new to force a specific memory allocator in many functions. I was able to remove the following code:

    PyMemAllocatorEx old_alloc;
    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
    ...
    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

Calling Py_Main() after Py_Initialize() is still supported. In this case, it no longer checks the memory allocators name because _PyMem_GetAllocatorsName() returns "pymalloc_debug" (or "malloc_debug" if pymalloc is disabled) after _PyMem_SetupAllocators("debug") is called: names are diffrent.

vstinner · 2019-03-06T00:54:08Z

I created bpo-36202: "Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake".

vstinner · 2019-03-06T00:55:56Z

Ok, the _PyPreConfig structure is now available and used internally. The API can likely be enhanced, but I prefer to open new follow-up issues like bpo-36202, since this issue has already a long list of changes :-) I close this issue.

Nick: thanks for updating the PEP-432 ;-

vstinner · 2019-03-06T11:51:55Z

New changeset 25d13f3 by Victor Stinner in branch 'master':
bpo-36142: PYTHONMALLOC overrides PYTHONDEV (GH-12191)
25d13f3

vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 28, 2019

vstinner closed this as completed Mar 6, 2019

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

ncoghlan commented Mar 2, 2019

ncoghlan commented Mar 3, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

Comments

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Feb 28, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

vstinner commented Mar 1, 2019

ncoghlan commented Mar 2, 2019

ncoghlan commented Mar 3, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 5, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019

vstinner commented Mar 6, 2019