Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings #80323

Closed
vstinner opened this issue Feb 28, 2019 · 22 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

BPO 36142
Nosy @ncoghlan, @vstinner
PRs
  • [WIP] bpo-36142: Rework Python init to add _PyPreConfig #12087
  • bpo-36142: Exclude coreconfig.h from Py_LIMITED_API #12111
  • bpo-36142: Rework error reporting in pymain_main() #12113
  • bpo-36142: Move command line parsing to coreconfig.c #12123
  • bpo-36142: Remove _PyMain structure #12120
  • bpo-36142: Add preconfig.c #12128
  • bpo-36142: Add _PyPreConfig structure #12172
  • bpo-36142: Add _PyPreConfig_ReadFromArgv() #12173
  • bpo-36142: Add _PyPreConfig.utf8_mode #12174
  • bpo-36142: Add _PyPreConfig.allocator #12181
  • bpo-36142: Add _PyMem_GetDebugAllocatorsName() #12185
  • bpo-36142: _PyPreConfig_Write() sets the allocator #12186
  • bpo-36142: Add _PyPreConfig_SetAllocator() #12187
  • bpo-36142: _PyPreConfig_Read() sets LC_CTYPE #12188
  • bpo-36142: PYTHONMALLOC overrides PYTHONDEV #12191
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-03-06.00:55:55.754>
    created_at = <Date 2019-02-28.01:42:19.448>
    labels = ['interpreter-core', '3.8']
    title = 'Add a new _PyPreConfig step to Python initialization to setup memory allocator and encodings'
    updated_at = <Date 2019-03-06.11:51:55.470>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2019-03-06.11:51:55.470>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-03-06.00:55:55.754>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2019-02-28.01:42:19.448>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36142
    keywords = ['patch']
    message_count = 22.0
    messages = ['336791', '336792', '336887', '336918', '336921', '336924', '336931', '336946', '336989', '337024', '337161', '337164', '337182', '337225', '337241', '337244', '337246', '337249', '337250', '337253', '337254', '337295']
    nosy_count = 2.0
    nosy_names = ['ncoghlan', 'vstinner']
    pr_nums = ['12087', '12111', '12113', '12123', '12120', '12128', '12172', '12173', '12174', '12181', '12185', '12186', '12187', '12188', '12191']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36142'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    I added a _PyCoreConfig structure to Python 3.7 which contains almost all parameters used to configure Python. Problems: _PyCoreConfig uses bytes and Unicode strings (char* and wchar_t*) whereas it is also used to setup the memory allocator and (filesystem, locale and stdio) encodings.

    I propose to add a new _PyPreConfig which is the "strict minimum" configuration to setup encodings and the memory allocator. In practice, it also contains parameters which directly or indirectly impacts the allocator and encodings. For example, isolated impacts use_environment which impacts the allocator (PYTHONMALLOC environment variable). Another example: dev_mode=1 sets the allocator to "debug".

    The command line arguments are now parsed twice. _PyPreConfig only parses a few parameters like -E, -I and -X. A temporary _PyPreCmdline is used to store command line arguments like -X options.

    I moved structures closer to where they are used. "Global" _PyMain structure has been removed. _PyCmdline now lives way shorter than previously and is moved from main.c to coreconfig.c. The idea is to better control when and how memory is allocated.

    In term of API, we get something like:

    _PyCoreConfig config = _PyCoreConfig_INIT;
    config.preconfig.stdio_encoding = "iso8859-1";
    config.preconfig.stdio_errors = "replace";
    config.user_site_directory = 0;
    ...
    
        _PyInitError err = _Py_InitializeFromConfig(&config);
        if (_Py_INIT_FAILED(err)) {
            _Py_ExitInitError(err);
        }
        ...
        Py_Finalize();
        return 0;

    "config.preconfig.stdio_errors" syntax isn't great, but it's simpler to implement than duplicating all _PyPreConfig fields into _PyCoreConfig.

    @vstinner vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Feb 28, 2019
    @vstinner
    Copy link
    Member Author

    PR 12087 is a WIP change which implements everything as a single commit.

    I'm not 100% sure yet that it's best approach for Python initialization, but I'm sure that it solves real interdependencies issues between _PyCoreConfig parameters. IHMO have a "pre-initialization" step to setup the memory allocator and the encodings is a better design.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    New changeset f684d83 by Victor Stinner in branch 'master':
    bpo-36142: Exclude coreconfig.h from Py_LIMITED_API (GH-12111)
    f684d83

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    TODO: check if _Py_ClearFileSystemEncoding() uses the right memory allocator. _Py_SetFileSystemEncoding() doesn't change temporarily the memory allocator.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    New changeset dfe8847 by Victor Stinner in branch 'master':
    bpo-36142: Rework error reporting in pymain_main() (GH-12113)
    dfe8847

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    New changeset 95e2cbf by Victor Stinner in branch 'master':
    bpo-36142: Move command line parsing to coreconfig.c (GH-12123)
    95e2cbf

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    New changeset 91b9ecf by Victor Stinner in branch 'master':
    bpo-36142: Add preconfig.c (GH-12128)
    91b9ecf

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 1, 2019

    New changeset 62be763 by Victor Stinner in branch 'master':
    bpo-36142: Remove _PyMain structure (GH-12120)
    62be763

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 2, 2019

    Agreed - I think the biggest thing we learned from the pre-implementation in Python 3.7 is that the "Let's move as much config as we can to Python C API data types" fell down in a couple of areas:

    1. The embedding application is likely to speak char* and/or wchar_* natively, not PyObject*, and this applies even for CPython's own current Py_Main implementation.

    2. There's some core system libc interaction scaffolding that we need in place first, giving 3 phases, not two:

    • initialise anything needed to read configuration settings from the environment and command line (i.e. memory allocators, interface encodings)
    • initialise the things needed to execute builtin and frozen Python modules (core data types, random hash seed, etc)
    • initialise the things needed to execute external Python modules (sys.path, etc)

    I'll update PEP-432 so it at least mentions some of the lessons learned, and points to the current internal configuration API definitions in the CPython source tree.

    @ncoghlan
    Copy link
    Contributor

    ncoghlan commented Mar 3, 2019

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset cad1f74 by Victor Stinner in branch 'master':
    bpo-36142: Add _PyPreConfig structure (GH-12172)
    cad1f74

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset 6dcb542 by Victor Stinner in branch 'master':
    bpo-36142: Add _PyPreConfig_ReadFromArgv() (GH-12173)
    6dcb542

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset 5a02e0d by Victor Stinner in branch 'master':
    bpo-36142: Add _PyPreConfig.utf8_mode (GH-12174)
    5a02e0d

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset b35be4b by Victor Stinner in branch 'master':
    bpo-36142: Add _PyPreConfig.allocator (GH-12181)
    b35be4b

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset a9df651 by Victor Stinner in branch 'master':
    bpo-36142: Add _PyMem_GetDebugAllocatorsName() (GH-12185)
    a9df651

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 5, 2019

    New changeset 7d2ef3e by Victor Stinner in branch 'master':
    bpo-36142: _PyPreConfig_Write() sets the allocator (GH-12186)
    7d2ef3e

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    New changeset c656e25 by Victor Stinner in branch 'master':
    bpo-36142: Add _PyPreConfig_SetAllocator() (GH-12187)
    c656e25

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    New changeset 4fffd38 by Victor Stinner in branch 'master':
    bpo-36142: _PyPreConfig_Read() sets LC_CTYPE (GH-12188)
    4fffd38

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    Description of this long serie of changes.

    I modified Py_Main() and _Py_InitializeCore() to clearly separate "pre-configuration" from "configuration" steps. The pre-configuration now decodes temporarily the command line arguments and uses its own command line parser to get -E, -I and -X options (-X is needed for -X utf8). The pre-configuration is designed to be as small as possible, it configures:

    • memory allocators
    • LC_CTYPE locale and set the UTF-8 mode

    The _PyPreConfig structure has 8 fields:

    • allocator
    • coerce_c_locale
    • coerce_c_locale_warn
    • dev_mode
    • isolated
    • (Windows only) legacy_windows_fs_encoding
    • use_environment
    • utf8_mode

    I had to include fields which have an impact on other fields. Examples:

    • dev_mode=1 sets allocator to "default";
    • isolated=1 sets use_environment to 0;
    • legacy_windows_fs_encoding=& sets utf8_mode to 0.

    _PyCoreConfig_Read() is now only called after the memory allocator and the locale (LC_CTYPE locale and UTF-8 mode) are properly configured.

    I removed the last side effects of _PyCoreConfig_Read(): it no longer modify the locale. Same for the new _PyPreConfig_Read(): zero size effect.

    The new _PyPreConfig_Write() and _PyCoreConfig_Write() are now responsible to write the new configurations.

    There are functions to read the configuration from command line arguments:

    • _PyPreConfig_ReadFromArgv()
    • _PyCoreConfig_ReadFromArgv()

    These functions expect a _PyArgv structure which accepts bytes (wchar*) or Unicode (wchar_t*).

    I moved coreconfig.h from Include/ to Include/cpython/ to be more explicit that it's excluded from the stable API and that it's CPython specific.

    I moved all config functions to a new Include/internal/pycore_coreconfig.h. Functions are internal to allow us to modify us anytime until a proper clean public API is designed on top of it.

    If _PyPreConfig.allocator is set, _PyPreConfig_Write() re-allocate the configuration with the new memory allocator. This tiny thing avoids the new to force a specific memory allocator in many functions. I was able to remove the following code:

        PyMemAllocatorEx old_alloc;
        _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);
        ...
        PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

    Calling Py_Main() after Py_Initialize() is still supported. In this case, it no longer checks the memory allocators name because _PyMem_GetAllocatorsName() returns "pymalloc_debug" (or "malloc_debug" if pymalloc is disabled) after _PyMem_SetupAllocators("debug") is called: names are diffrent.

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    I created bpo-36202: "Calling Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake".

    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    Ok, the _PyPreConfig structure is now available and used internally. The API can likely be enhanced, but I prefer to open new follow-up issues like bpo-36202, since this issue has already a long list of changes :-) I close this issue.

    Nick: thanks for updating the PEP-432 ;-

    @vstinner vstinner closed this as completed Mar 6, 2019
    @vstinner
    Copy link
    Member Author

    vstinner commented Mar 6, 2019

    New changeset 25d13f3 by Victor Stinner in branch 'master':
    bpo-36142: PYTHONMALLOC overrides PYTHONDEV (GH-12191)
    25d13f3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants