Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 432 (PEP 587): Redesign the interpreter startup sequence #66453

Closed
ncoghlan opened this issue Aug 23, 2014 · 31 comments
Closed

PEP 432 (PEP 587): Redesign the interpreter startup sequence #66453

ncoghlan opened this issue Aug 23, 2014 · 31 comments
Assignees
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

BPO 22257
Nosy @Yhg1s, @warsaw, @terryjreedy, @jcea, @ncoghlan, @vstinner, @ned-deily, @ericsnowcurrently, @serhiy-storchaka, @dstufft
PRs
  • bpo-22257: Small changes for PEP 432. #1728
  • bpo-22257: Private C-API for main interpreter initialization (PEP 432). #1729
  • bpo-22257: Fix CLI by using int instead of char (compares to EOF). #1765
  • bpo-22257: Revert an invalid change to a test (from 6b4be19). #1770
  • bpo-22257: Private C-API for core runtime initialization (PEP 432). #1772
  • bpo-22257: Drop a duplicate line. #1809
  • bpo-30547: Fix multiple reference leaks #1995
  • bpo-22257: Mention startup refactoring in What's New #4286
  • Dependencies
  • bpo-18093: Move main functions to a separate Programs directory
  • bpo-22869: Split pylifecycle.c out from pythonrun.c
  • Files
  • pep432-startup-redesign-skeleton.diff: Snapshot of current BitBucket branch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ncoghlan'
    closed_at = <Date 2019-10-06.00:07:28.586>
    created_at = <Date 2014-08-23.10:22:31.815>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'PEP 432 (PEP 587): Redesign the interpreter startup sequence'
    updated_at = <Date 2019-10-06.00:07:28.582>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2019-10-06.00:07:28.582>
    actor = 'ncoghlan'
    assignee = 'ncoghlan'
    closed = True
    closed_date = <Date 2019-10-06.00:07:28.586>
    closer = 'ncoghlan'
    components = ['Interpreter Core']
    creation = <Date 2014-08-23.10:22:31.815>
    creator = 'ncoghlan'
    dependencies = ['18093', '22869']
    files = ['43171']
    hgrepos = []
    issue_num = 22257
    keywords = ['patch']
    message_count = 31.0
    messages = ['225738', '225772', '225773', '225880', '225965', '226069', '226156', '226158', '226186', '226196', '231157', '267191', '267357', '275330', '294219', '294229', '294260', '294274', '294279', '294288', '294319', '294328', '295407', '305583', '305585', '314349', '314354', '342531', '343635', '353245', '354027']
    nosy_count = 14.0
    nosy_names = ['twouters', 'barry', 'terry.reedy', 'jcea', 'ncoghlan', 'vstinner', 'ned.deily', 'Arfrever', 'eric.snow', 'serhiy.storchaka', 'Drekin', 'dstufft', 'Gregory.Salvan', 'sbspider']
    pr_nums = ['1728', '1729', '1765', '1770', '1772', '1809', '1995', '4286']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22257'
    versions = ['Python 3.8']

    @ncoghlan
    Copy link
    Contributor Author

    PEP-432 describes a plan to redesign how the interpreter startup sequence is implemented.

    This is a placeholder issue to be used to help track other problems which can't easily be fixed without those changes.

    @ncoghlan ncoghlan added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Aug 23, 2014
    @ncoghlan
    Copy link
    Contributor Author

    I'll also be using this issue to track some of the ground work I plan to lay this time. I found in my last attempt that the interpreter internals needed some initial refactoring to take better advantage of C level state encapsulation in different modules.

    @ncoghlan ncoghlan self-assigned this Aug 24, 2014
    @ncoghlan
    Copy link
    Contributor Author

    Just noting that bpo-18093 was also part of that groundwork.

    My previous attempt at a draft PEP-432 patch: https://bitbucket.org/ncoghlan/cpython_sandbox/src/pep432_modular_bootstrap/?at=pep432_modular_bootstrap

    That initial attempt ended up being abandoned as I spent more time focusing on packaging issues, and there are too many merge conflicts for it to be worth bringing it back it up to date. Those are mostly due to the changes in tree layout though, which is why this time I'll do the non-functional restructuring *first* on trunk, and only then start on a new PEP-432 branch in the sandbox.

    @GregorySalvan
    Copy link
    Mannequin

    GregorySalvan mannequin commented Aug 25, 2014

    In case it helps, I've experienced a similar refactoring.

    we used a pattern of stages and services:

    • stages represent a state of the application (like Pre-Initialization, Initializing, Initialized...), they are composed of services
    • services represent a key responsability over the system (set program name, set python home...)

    The launching sequence was determined by a game of dependencies.
    ex: "Initialized" claims it requires "Initializing" which claims it requires Pre-Initialisation...
    So when you ask to load stage Initialized the launcher can construct then run the sequence: Pre-Initialisation -> Initializing -> Initialized.

    We used same mechanisms for Services.

    This way you can insert/append new stages or services just by creating them and declaring they should be run after X and/or before Y.

    Key benefits:

    • easy to maintain and extend, flexible
    • thread safe, launcher/runner can take the decision to parallelize. To serve this purpose each service can take a context object and return the context object that will be passed to the next service, between each stages contexts are merged if parallelized...
    • easy to debug: you've got error messages like: At Stage X, service Y fails with error message Z.
    • optimization friendly: while debugging you can measure the time taken by each service and compare it with and older version of the system for example.
    • few changes to original code, it's just copy/pasting chunk of code.

    Drawbacks:

    • it's hard for developpers to have a picture of what happens, this require to make a launcher/debugger which can dump the launching sequence.

    @ncoghlan
    Copy link
    Contributor Author

    I wouldn't mind heading in that direction at a later stage. PEP-432 is aimed at a simpler proposition of breaking things up into two steps:

    Step 1: get a functional bytecode compiler and eval loop up and running (only builtin and frozen modules available) (this is what "begin initialization" would handle)
    Step 2: get everything else up and running with the aid of a mostly working core C API (this is everything else up to and including "end initialization"). In particular, we'd be able to use builtins, like str and list, rather than having to manage everything in pure C (or breach the API guarantees by creating objects before the interpreter is fully set up, as happens now).

    So step 1 will probably need to remain a distinct operation called from the embedding application (including the Python CLI itself) but there may be room to move in how we get from the beginning of the initialisation process to the end.

    @terryjreedy
    Copy link
    Member

    Are you planning to un-defer the PEP, and remove the Deferral section?

    The PEP proposes 5 'phases'. How does that mesh with 2 'steps'?

    Gregory's message is helpful to me. The Idle startup needs to be documented (AFAIK only code now ) and modified. Internal error messages are 'print'ed to a text console that is normally not present on Windows (resulting in an exception message that cannot be displayed!). I want to add a new startup 'service', as early as possible, to direct error messages to a gui message box or window. This means getting a tkinter event loop running as soon as possible so everything after can depend on that. Perhaps it already is, perhaps not.

    @ncoghlan
    Copy link
    Contributor Author

    It's still deferred for the time being. Based on what I learned on my previous attempt at implementing it, there's some prep work I need to do where I believe reviewing someone else's attempt at doing it would actually be *more* work than doing the work myself (this is a truly arcane area of the current implementation - *I* find it hard to follow, and I've been hacking on it for years. PEP-432 was actually inspired by the sheer amount of work that was involved in getting the new pure Python import system integrated properly for Python 3.3).

    That prep work is refactoring the mammoth pythonrun.c file to split out a separate lifecycle.c file that just has the startup and shutdown code, leaving pythonrun.c as a pure runtime module. Anything that remains in pythonrun.c should be able to assume a fully functional Python interpreter is available, while the code in lifecycle.c will need to be able to cope with the fact that the interpreter may only be partially functional (whether that's due to it being setup or destroyed).

    The reason this matters is that it lets me bring the C linker to bear on the problem of enforcing state encapsulation. This proved absolutely essential in my initial PEP-432 implementation attempt, but doing the restructure in the fork resulted in an unacceptably high number of merge conflicts. Doing the restructure *first* should make it far more feasible to maintain the feature branch, and make it practical to restart work on the PEP itself.

    Once we get to that point, then it should actually be possible to have a proper collaborative branch in my CPython sandbox repo on BitBucket, and keep it in sync with CPython trunk relatively easily.

    First step is getting the restructure patch together, though. I actually *have* started work on that, but it isn't in a sensible enough state to be worth sharing at this point. Once it is, I'll open a separate tracker issue specifically for that, and make this one depend on it.

    @ncoghlan
    Copy link
    Contributor Author

    As far as the specific 5 phase vs 2 steps goes, the two steps in PEP-432 terms are the "Pre-Initialized -> Initializing" transition and the "Initializing -> Initialized" transition.

    What Gregory is talking about is a potentially good way to organise the second step - systemd in Linux is similarly organised around the idea of a directed acyclic graph of dependencies. For the initial implementation, we're unlikely to go that far though - we'll likely keep the existing initialisation code, and just rearrange the high level invocations.

    The other phases in PEP-432 are due to the fact that "run __main__" is a separate step, distinct from interpreter initialisation. When you embed Python as the scripting engine in a larger application, the idea of having a __main__ module may not actually make any sense. In those cases, it will still be there (as the interpreter creates it automatically), it will just be empty. But for the CPython CLI, we need that extra "run main" step. (There's a strong case to be made for that being in a separate PEP, and I may still do that - this discussion is certainly pushing me in that direction)

    @GregorySalvan
    Copy link
    Mannequin

    GregorySalvan mannequin commented Aug 31, 2014

    I didn't dare to share this but in case... just few days after my message I fall on the inspiring work of Dr. Hans Vandierendonck (presented during the 2nd International Summer School on Advances in Programming Languages in Edinburgh).

    Certainly too much, but he open sourced a scheduler (C++11) with interesting ressources and a minimal example of a staged bootstrap.
    It's clearly oriented parallel computing, but the way to declare dependencies and the idea of versionned object is quite good.

    All materials are dipsonible here: http://www.macs.hw.ac.uk/~dsg/events/ISS-AiPL-2014/materials/Vandierendonck/

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Sep 1, 2014

    Thanks for the reference!

    @ncoghlan
    Copy link
    Contributor Author

    bpo-22869 now covers the preparatory refactoring to split pythonrun into two modules.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Jun 4, 2016

    I merged the current 3.6 dev branch into the BitBucket PEP-432 branch in my CPython sandbox. The attached patch is the diff between that branch and CPython default.

    All of the proposed changes here should only affect private APIs now, allowing this to be handled as a private refactoring with settings being migrated incrementally, rather than building up a large hard to maintain pending patch.

    Donald Stufft raised the prospect of potentially using these changes to create a nicer single-file executable builder that provides an alternate binary that just runs itself - statically linking needed extension modules and then prepending the resulting binary to a zip archive with a __main__.py file should support quite a few scenarios.

    I've also added Thomas Wouters to the nosy list, since Google have been looking at a range of options related to CPython startup and single file executables, which should provide a valuable perspective on the utility of these changes.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Jun 5, 2016

    Looking at the potential impact of being able to use C99 initializers for the main configuration structs, I realised those could be a *lot* easier to work with if they consisted entirely of pointers to Python objects:

    • NULL initialisation would correctly indicate "not set" for each value
    • Running Py_XDECREF on every field will correctly release memory
    • Boolean toggles are clearly separated from multi-value integers (PyBool vs PyLong)
    • No conversion is needed to provide a read-only view of the config data at the Python level

    This wouldn't be feasible for CoreConfig (since that is populated before Python object creation is permitted), but should work for MainInterpreterConfig and InterpreterConfig.

    @ericsnowcurrently
    Copy link
    Member

    It may be worth refactoring the patch relative to the new C99 support.

    @ericsnowcurrently
    Copy link
    Member

    New changeset 6b4be19 by Eric Snow in branch 'master':
    bpo-22257: Small changes for PEP-432. (bpo-1728)
    6b4be19

    @serhiy-storchaka
    Copy link
    Member

    Why Lib/test/coding20731.py was changed?

    @serhiy-storchaka serhiy-storchaka added 3.7 (EOL) end of life type-feature A feature request or enhancement labels May 23, 2017
    @ericsnowcurrently
    Copy link
    Member

    The change in Lib/test/coding20731.py was the result of running PCbuild/fix_encoding.py.

    @serhiy-storchaka
    Copy link
    Member

    This change should be reverted. Lib/test/coding20731.py intentionally should have the CRLF lines separator.

    @ericsnowcurrently
    Copy link
    Member

    New changeset e0918ec by Eric Snow in branch 'master':
    bpo-22257: Fix CLI by using int instead of char (compares to EOF). (bpo-1765)
    e0918ec

    @ericsnowcurrently
    Copy link
    Member

    reverting the change to that test: #1770

    @ericsnowcurrently
    Copy link
    Member

    New changeset 1abcf67 by Eric Snow in branch 'master':
    bpo-22257: Private C-API for core runtime initialization (PEP-432). (bpo-1772)
    1abcf67

    @ericsnowcurrently
    Copy link
    Member

    New changeset c7ec998 by Eric Snow in branch 'master':
    bpo-22257: Private C-API for main interpreter initialization (PEP-432). (bpo-1729)
    c7ec998

    @vstinner
    Copy link
    Member

    vstinner commented Jun 8, 2017

    New changeset ab1cb80 by Victor Stinner (Stéphane Wirtel) in branch 'master':
    bpo-30547: Fix multiple reference leaks (bpo-1995)
    ab1cb80

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Nov 5, 2017

    Bug report (since resolved) that highlighted our general lack of test coverage for the interactions between environment variable based configuration and command line based configuration: https://bugs.python.org/issue31845

    This work revealed the absence, since the refactoring changes the order in which we check environment variables and the command line (we used to check the command line first, and then env vars later during Py_Initialize, now we check env vars in _Py_InitializeCore, and the command line afterwards).

    Something I'd also forgotten is that I'd switched the PEP to use "_Py_InitializeRuntime" and "_Py_ConfigureMainInterpreter", but the draft implementation is currently still using "_Py_InitializeCore" and "_Py_InitializeMainInterpreter".

    For that last point, it's probably easier to change the PEP back than it is to tinker with the implementation - those specific API names are pretty arbitrary anyway.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Nov 5, 2017

    New changeset 1b46131 by Nick Coghlan in branch 'master':
    bpo-22257: Mention startup refactoring in What's New (GH-4286)
    1b46131

    @ned-deily
    Copy link
    Member

    See bpo-33128: PathFinder is twice on sys.meta_path.

    Also, is this issue supposed to remain open across releases?

    @ncoghlan
    Copy link
    Contributor Author

    Aye, this is the issue for making the API public, so it will stay open until PEP-432 is actually accepted.

    We switched to the pre-implement-changes-as-an-internal-CPython-refactoring approach after we/I realised there was no feasible way to develop and maintain these an out of tree feature branch (with the early pay-off from the change in approach being the feasibility of implementing 3.7 changes like UTF-8 mode).

    @ncoghlan ncoghlan added 3.8 only security fixes and removed 3.7 (EOL) end of life labels Mar 24, 2018
    @vstinner
    Copy link
    Member

    Update: I proposed the PEP-587 to expose the _PyCoreConfig API in public.

    @vstinner
    Copy link
    Member

    Update: my PEP-587 has been accepted and I implemented it in bpo-36763.

    @vstinner
    Copy link
    Member

    The PEP-587 has been implemented in Python 3.8 with a "Multi-Phase Initialization Private Provisional API":
    https://docs.python.org/dev/c-api/init_config.html#multi-phase-initialization-private-provisional-api

    I suggest to close this issue. When somone will want to work again on the PEP-432, I would suggest to open a new issue. So it will be more explicit that it would be implement on top of the existing PEP-587 implementation.

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Oct 6, 2019

    Agreed. I've also added PEP-587 to the issue title to make the connection to that PEP more obvious.

    @ncoghlan ncoghlan closed this as completed Oct 6, 2019
    @ncoghlan ncoghlan changed the title PEP 432: Redesign the interpreter startup sequence PEP 432 (PEP 587): Redesign the interpreter startup sequence Oct 6, 2019
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants