classification
Title: Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ncoghlan, vstinner
Priority: normal Keywords: patch

Created on 2019-02-28 01:42 by vstinner, last changed 2019-03-06 11:51 by vstinner. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12087 closed vstinner, 2019-02-28 01:47
PR 12111 merged vstinner, 2019-03-01 02:25
PR 12113 merged vstinner, 2019-03-01 03:20
PR 12123 merged vstinner, 2019-03-01 14:28
PR 12120 merged vstinner, 2019-03-01 14:53
PR 12128 merged vstinner, 2019-03-01 16:30
PR 12172 merged vstinner, 2019-03-04 23:51
PR 12173 merged vstinner, 2019-03-05 01:27
PR 12174 merged vstinner, 2019-03-05 02:06
PR 12181 merged vstinner, 2019-03-05 16:12
PR 12185 merged vstinner, 2019-03-05 22:16
PR 12186 merged vstinner, 2019-03-05 22:38
PR 12187 merged vstinner, 2019-03-05 23:55
PR 12188 merged vstinner, 2019-03-06 00:26
PR 12191 merged vstinner, 2019-03-06 11:30
Messages (22)
msg336791 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-28 01:42
I added a _PyCoreConfig structure to Python 3.7 which contains almost all parameters used to configure Python. Problems: _PyCoreConfig uses bytes and Unicode strings (char* and wchar_t*) whereas it is also used to setup the memory allocator and (filesystem, locale and stdio) encodings.

I propose to add a new _PyPreConfig which is the "strict minimum" configuration to setup encodings and the memory allocator. In practice, it also contains parameters which directly or indirectly impacts the allocator and encodings. For example, isolated impacts use_environment which impacts the allocator (PYTHONMALLOC environment variable). Another example: dev_mode=1 sets the allocator to "debug".

The command line arguments are now parsed twice. _PyPreConfig only parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline is used to store command line arguments like -X options.

I moved structures closer to where they are used. "Global" _PyMain structure has been removed. _PyCmdline now lives way shorter than previously and is moved from main.c to coreconfig.c. The idea is to better control when and how memory is allocated.

In term of API, we get something like:

    _PyCoreConfig config = _PyCoreConfig_INIT;
    config.preconfig.stdio_encoding = "iso8859-1";
    config.preconfig.stdio_errors = "replace";
    config.user_site_directory = 0;
    ...

    _PyInitError err = _Py_InitializeFromConfig(&config);
    if (_Py_INIT_FAILED(err)) {
        _Py_ExitInitError(err);
    }
    ...
    Py_Finalize();
    return 0;

"config.preconfig.stdio_errors" syntax isn't great, but it's simpler to implement than duplicating all _PyPreConfig fields into _PyCoreConfig.
msg336792 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-02-28 01:50
PR 12087 is a WIP change which implements everything as a single commit.

I'm not 100% sure yet that it's best approach for Python initialization, but I'm sure that it solves real interdependencies issues between _PyCoreConfig parameters. IHMO have a "pre-initialization" step to setup the memory allocator and the encodings is a better design.
msg336887 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 02:44
New changeset f684d83d86e1990784816d4b243d724e6ab8304f by Victor Stinner in branch 'master':
bpo-36142: Exclude coreconfig.h from Py_LIMITED_API (GH-12111)
https://github.com/python/cpython/commit/f684d83d86e1990784816d4b243d724e6ab8304f
msg336918 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 14:27
TODO: check if _Py_ClearFileSystemEncoding() uses the right memory allocator. _Py_SetFileSystemEncoding() doesn't change temporarily the memory allocator.
msg336921 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 14:53
New changeset dfe884759d1f4441c889695f8985bc9feb9f37eb by Victor Stinner in branch 'master':
bpo-36142: Rework error reporting in pymain_main() (GH-12113)
https://github.com/python/cpython/commit/dfe884759d1f4441c889695f8985bc9feb9f37eb
msg336924 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 15:25
New changeset 95e2cbf32f8156c239b27dae558ba058d0f2d496 by Victor Stinner in branch 'master':
bpo-36142: Move command line parsing to coreconfig.c (GH-12123)
https://github.com/python/cpython/commit/95e2cbf32f8156c239b27dae558ba058d0f2d496
msg336931 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 16:52
New changeset 91b9ecf82c3287b45f39158c5134a87414ff26bc by Victor Stinner in branch 'master':
bpo-36142: Add preconfig.c (GH-12128)
https://github.com/python/cpython/commit/91b9ecf82c3287b45f39158c5134a87414ff26bc
msg336946 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-01 18:32
New changeset 62be763348d16ba90f96667aa0240503261393f0 by Victor Stinner in branch 'master':
bpo-36142: Remove _PyMain structure (GH-12120)
https://github.com/python/cpython/commit/62be763348d16ba90f96667aa0240503261393f0
msg336989 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-03-02 04:41
Agreed - I think the biggest thing we learned from the pre-implementation in Python 3.7 is that the "Let's move as much config as we can to Python C API data types" fell down in a couple of areas:

1. The embedding application is likely to speak char* and/or wchar_* natively, not PyObject*, and this applies even for CPython's own current `Py_Main` implementation.

2. There's some core system libc interaction scaffolding that we need in place first, giving 3 phases, not two:

- initialise anything needed to read configuration settings from the environment and command line (i.e. memory allocators, interface encodings)
- initialise the things needed to execute builtin and frozen Python modules (core data types, random hash seed, etc)
- initialise the things needed to execute external Python modules (sys.path, etc)

I'll update PEP 432 so it at least mentions some of the lessons learned, and points to the current internal configuration API definitions in the CPython source tree.
msg337024 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2019-03-03 02:17
PEP 432 tweaked: https://github.com/python/peps/pull/904/files
msg337161 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 01:01
New changeset cad1f747da47849ab5d8b0b881f7a0b94564d290 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig structure (GH-12172)
https://github.com/python/cpython/commit/cad1f747da47849ab5d8b0b881f7a0b94564d290
msg337164 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 01:44
New changeset 6dcb54228e7520abd058897440c26e323f62afcd by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig_ReadFromArgv() (GH-12173)
https://github.com/python/cpython/commit/6dcb54228e7520abd058897440c26e323f62afcd
msg337182 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 11:32
New changeset 5a02e0d1c8a526fc4e80a2fb8b4a9d5bc64c7d82 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig.utf8_mode (GH-12174)
https://github.com/python/cpython/commit/5a02e0d1c8a526fc4e80a2fb8b4a9d5bc64c7d82
msg337225 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 16:37
New changeset b35be4b3334fbc471a39abbeb68110867b72e3e5 by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig.allocator (GH-12181)
https://github.com/python/cpython/commit/b35be4b3334fbc471a39abbeb68110867b72e3e5
msg337241 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 22:31
New changeset a9df651eb4c18a07ec309df190419613e95cba7b by Victor Stinner in branch 'master':
bpo-36142: Add _PyMem_GetDebugAllocatorsName() (GH-12185)
https://github.com/python/cpython/commit/a9df651eb4c18a07ec309df190419613e95cba7b
msg337244 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-05 23:37
New changeset 7d2ef3ef5042356aaeaf832ad4204b7dad2e1b8c by Victor Stinner in branch 'master':
bpo-36142: _PyPreConfig_Write() sets the allocator (GH-12186)
https://github.com/python/cpython/commit/7d2ef3ef5042356aaeaf832ad4204b7dad2e1b8c
msg337246 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 00:13
New changeset c656e25667c9acc0d13e5bb16d3df2938d0f614b by Victor Stinner in branch 'master':
bpo-36142: Add _PyPreConfig_SetAllocator() (GH-12187)
https://github.com/python/cpython/commit/c656e25667c9acc0d13e5bb16d3df2938d0f614b
msg337249 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 00:44
New changeset 4fffd380a4070aff39b7fd443d90e60746c1b623 by Victor Stinner in branch 'master':
bpo-36142: _PyPreConfig_Read() sets LC_CTYPE (GH-12188)
https://github.com/python/cpython/commit/4fffd380a4070aff39b7fd443d90e60746c1b623
msg337250 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 00:45
Description of this long serie of changes.

I modified Py_Main() and _Py_InitializeCore() to clearly separate "pre-configuration" from "configuration" steps. The pre-configuration now decodes temporarily the command line arguments and uses its own command line parser to get -E, -I and -X options (-X is needed for -X utf8). The pre-configuration is designed to be as small as possible, it configures:

* memory allocators
* LC_CTYPE locale and set the UTF-8 mode

The _PyPreConfig structure has 8 fields:

* allocator
* coerce_c_locale
* coerce_c_locale_warn
* dev_mode
* isolated
* (Windows only) legacy_windows_fs_encoding
* use_environment
* utf8_mode

I had to include fields which have an impact on other fields. Examples:

* dev_mode=1 sets allocator to "default";
* isolated=1 sets use_environment to 0;
* legacy_windows_fs_encoding=& sets utf8_mode to 0.

_PyCoreConfig_Read() is now only called after the memory allocator and the locale (LC_CTYPE locale and UTF-8 mode) are properly configured.

I removed the last side effects of _PyCoreConfig_Read(): it no longer modify the locale. Same for the new _PyPreConfig_Read(): zero size effect.

The new _PyPreConfig_Write() and _PyCoreConfig_Write() are now responsible to write the new configurations.

There are functions to read the configuration from command line arguments:

* _PyPreConfig_ReadFromArgv()
* _PyCoreConfig_ReadFromArgv()

These functions expect a _PyArgv structure which accepts bytes (wchar*) or Unicode (wchar_t*).

I moved coreconfig.h from Include/ to Include/cpython/ to be more explicit that it's excluded from the stable API and that it's CPython specific.

I moved all config functions to a new Include/internal/pycore_coreconfig.h. Functions are internal to allow us to modify us anytime until a proper clean public API is designed on top of it.

If _PyPreConfig.allocator is set, _PyPreConfig_Write() re-allocate the configuration with the new memory allocator. This tiny thing avoids the new to force a specific memory allocator in many functions. I was able to remove the following code:

    PyMemAllocatorEx old_alloc;
    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
    ...
    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

Calling Py_Main() after Py_Initialize() is still supported. In this case, it no longer checks the memory allocators name because _PyMem_GetAllocatorsName() returns "pymalloc_debug" (or "malloc_debug" if pymalloc is disabled) after _PyMem_SetupAllocators("debug") is called: names are diffrent.
msg337253 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 00:54
I created bpo-36202: "Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake".
msg337254 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 00:55
Ok, the _PyPreConfig structure is now available and used internally. The API can likely be enhanced, but I prefer to open new follow-up issues like bpo-36202, since this issue has already a long list of changes :-) I close this issue.

Nick: thanks for updating the PEP 432 ;-
msg337295 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-03-06 11:51
New changeset 25d13f37aa6743282d0b8b4df687ff89999964b2 by Victor Stinner in branch 'master':
bpo-36142: PYTHONMALLOC overrides PYTHONDEV (GH-12191)
https://github.com/python/cpython/commit/25d13f37aa6743282d0b8b4df687ff89999964b2
History
Date User Action Args
2019-03-06 11:51:55vstinnersetmessages: + msg337295
2019-03-06 11:30:56vstinnersetpull_requests: + pull_request12186
2019-03-06 00:55:55vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg337254

stage: patch review -> resolved
2019-03-06 00:54:08vstinnersetmessages: + msg337253
2019-03-06 00:45:15vstinnersetmessages: + msg337250
2019-03-06 00:44:36vstinnersetmessages: + msg337249
2019-03-06 00:26:44vstinnersetpull_requests: + pull_request12183
2019-03-06 00:13:49vstinnersetmessages: + msg337246
2019-03-05 23:55:49vstinnersetpull_requests: + pull_request12182
2019-03-05 23:37:00vstinnersetmessages: + msg337244
2019-03-05 22:38:55vstinnersetpull_requests: + pull_request12181
2019-03-05 22:31:58vstinnersetmessages: + msg337241
2019-03-05 22:16:07vstinnersetpull_requests: + pull_request12180
2019-03-05 16:37:55vstinnersetmessages: + msg337225
2019-03-05 16:12:03vstinnersetpull_requests: + pull_request12177
2019-03-05 11:32:14vstinnersetmessages: + msg337182
2019-03-05 02:06:25vstinnersetpull_requests: + pull_request12171
2019-03-05 01:44:14vstinnersetmessages: + msg337164
2019-03-05 01:27:03vstinnersetpull_requests: + pull_request12170
2019-03-05 01:01:30vstinnersetmessages: + msg337161
2019-03-04 23:51:05vstinnersetpull_requests: + pull_request12169
2019-03-03 02:17:46ncoghlansetmessages: + msg337024
2019-03-02 04:41:42ncoghlansetnosy: + ncoghlan
messages: + msg336989
2019-03-01 18:32:14vstinnersetmessages: + msg336946
2019-03-01 16:52:59vstinnersetmessages: + msg336931
2019-03-01 16:30:40vstinnersetpull_requests: + pull_request12131
2019-03-01 15:25:22vstinnersetmessages: + msg336924
2019-03-01 14:53:59vstinnersetpull_requests: + pull_request12127
2019-03-01 14:53:11vstinnersetmessages: + msg336921
2019-03-01 14:28:21vstinnersetpull_requests: + pull_request12125
2019-03-01 14:27:05vstinnersetmessages: + msg336918
2019-03-01 03:20:42vstinnersetpull_requests: + pull_request12119
2019-03-01 02:44:15vstinnersetmessages: + msg336887
2019-03-01 02:25:55vstinnersetpull_requests: + pull_request12118
2019-02-28 01:50:07vstinnersetmessages: + msg336792
2019-02-28 01:47:22vstinnersetkeywords: + patch
stage: patch review
pull_requests: + pull_request12097
2019-02-28 01:42:19vstinnercreate