classification
Title: PEP 432: Redesign the interpreter startup sequence
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.7
process
Status: open Resolution:
Dependencies: 18093 22869 Superseder:
Assigned To: ncoghlan Nosy List: Arfrever, Drekin, Gregory.Salvan, barry, dstufft, eric.snow, jcea, ncoghlan, sbspider, serhiy.storchaka, terry.reedy, twouters, vstinner
Priority: normal Keywords: patch

Created on 2014-08-23 10:22 by ncoghlan, last changed 2017-11-05 04:58 by ncoghlan.

Files
File name Uploaded Description Edit
pep432-startup-redesign-skeleton.diff ncoghlan, 2016-06-04 00:06 Snapshot of current BitBucket branch review
Pull Requests
URL Status Linked Edit
PR 1728 merged eric.snow, 2017-05-22 22:16
PR 1729 merged eric.snow, 2017-05-22 22:24
PR 1765 merged eric.snow, 2017-05-23 17:36
PR 1770 merged eric.snow, 2017-05-23 21:41
PR 1772 merged eric.snow, 2017-05-23 22:00
PR 1809 merged eric.snow, 2017-05-25 17:05
PR 1995 merged matrixise, 2017-06-08 10:16
PR 4286 merged ncoghlan, 2017-11-05 04:49
Messages (25)
msg225738 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-23 10:22
PEP 432 describes a plan to redesign how the interpreter startup sequence is implemented.

This is a placeholder issue to be used to help track other problems which can't easily be fixed without those changes.
msg225772 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-24 00:12
I'll also be using this issue to track some of the ground work I plan to lay this time. I found in my last attempt that the interpreter internals needed some initial refactoring to take better advantage of C level state encapsulation in different modules.
msg225773 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-24 00:17
Just noting that issue 18093 was also part of that groundwork.

My previous attempt at a draft PEP 432 patch: https://bitbucket.org/ncoghlan/cpython_sandbox/src/pep432_modular_bootstrap/?at=pep432_modular_bootstrap

That initial attempt ended up being abandoned as I spent more time focusing on packaging issues, and there are too many merge conflicts for it to be worth bringing it back it up to date. Those are mostly due to the changes in tree layout though, which is why this time I'll do the non-functional restructuring *first* on trunk, and only then start on a new PEP 432 branch in the sandbox.
msg225880 - (view) Author: Gregory Salvan (Gregory.Salvan) Date: 2014-08-25 17:30
In case it helps, I've experienced a similar refactoring.

we used a pattern of stages and services:
- stages represent a state of the application (like Pre-Initialization, Initializing, Initialized...), they are composed of services
- services represent a key responsability over the system (set program name, set python home...)

The launching sequence was determined by a game of dependencies.
  ex: "Initialized" claims it requires "Initializing" which claims it requires Pre-Initialisation...
So when you ask to load stage Initialized the launcher can construct then run the sequence: Pre-Initialisation -> Initializing -> Initialized.

We used same mechanisms for Services.

This way you can insert/append new stages or services just by creating them and declaring they should be run after X and/or before Y.  

Key benefits:
- easy to maintain and extend, flexible
- thread safe, launcher/runner can take the decision to parallelize. To serve this purpose each service can take a context object and return the context object that will be passed to the next service, between each stages contexts are merged if parallelized...
- easy to debug: you've got error messages like: At Stage X, service Y fails with error message Z.
- optimization friendly: while debugging you can measure the time taken by each service and compare it with and older version of the system for example.
- few changes to original code, it's just copy/pasting chunk of code.

Drawbacks:
- it's hard for developpers to have a picture of what happens, this require to make a launcher/debugger which can dump the launching sequence.
msg225965 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-27 10:41
I wouldn't mind heading in that direction at a later stage. PEP 432 is aimed at a simpler proposition of breaking things up into two steps:

Step 1: get a functional bytecode compiler and eval loop up and running (only builtin and frozen modules available) (this is what "begin initialization" would handle)
Step 2: get everything else up and running with the aid of a mostly working core C API (this is everything else up to and including "end initialization"). In particular, we'd be able to use builtins, like str and list, rather than having to manage everything in pure C (or breach the API guarantees by creating objects before the interpreter is fully set up, as happens now).

So step 1 will probably need to remain a distinct operation called from the embedding application (including the Python CLI itself) but there may be room to move in how we get from the beginning of the initialisation process to the end.
msg226069 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-08-29 18:43
Are you planning to un-defer the PEP, and remove the Deferral section?

The PEP proposes 5 'phases'. How does that mesh with 2 'steps'?

Gregory's message is helpful to me. The Idle startup needs to be documented (AFAIK only code now ) and modified. Internal error messages are 'print'ed to a text console that is normally not present on Windows (resulting in an exception message that cannot be displayed!).  I want to add a new startup 'service', as early as possible, to direct error messages to a gui message box or window. This means getting a tkinter event loop running as soon as possible so everything after can depend on that.  Perhaps it already is, perhaps not.
msg226156 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-31 02:57
It's still deferred for the time being. Based on what I learned on my previous attempt at implementing it, there's some prep work I need to do where I believe reviewing someone else's attempt at doing it would actually be *more* work than doing the work myself (this is a truly arcane area of the current implementation - *I* find it hard to follow, and I've been hacking on it for years. PEP 432 was actually inspired by the sheer amount of work that was involved in getting the new pure Python import system integrated properly for Python 3.3).

That prep work is refactoring the mammoth pythonrun.c file to split out a separate lifecycle.c file that just has the startup and shutdown code, leaving pythonrun.c as a pure runtime module. Anything that remains in pythonrun.c should be able to assume a fully functional Python interpreter is available, while the code in lifecycle.c will need to be able to cope with the fact that the interpreter may only be partially functional (whether that's due to it being setup or destroyed).

The reason this matters is that it lets me bring the C linker to bear on the problem of enforcing state encapsulation. This proved absolutely essential in my initial PEP 432 implementation attempt, but doing the restructure in the fork resulted in an unacceptably high number of merge conflicts. Doing the restructure *first* should make it far more feasible to maintain the feature branch, and make it practical to restart work on the PEP itself.

Once we get to that point, then it should actually be possible to have a proper collaborative branch in my CPython sandbox repo on BitBucket, and keep it in sync with CPython trunk relatively easily.

First step is getting the restructure patch together, though. I actually *have* started work on that, but it isn't in a sensible enough state to be worth sharing at this point. Once it is, I'll open a separate tracker issue specifically for that, and make this one depend on it.
msg226158 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-08-31 03:08
As far as the specific 5 phase vs 2 steps goes, the two steps in PEP 432 terms are the "Pre-Initialized -> Initializing" transition and the "Initializing -> Initialized" transition.

What Gregory is talking about is a potentially good way to organise the second step - systemd in Linux is similarly organised around the idea of a directed acyclic graph of dependencies. For the initial implementation, we're unlikely to go that far though - we'll likely keep the existing initialisation code, and just rearrange the high level invocations.

The other phases in PEP 432 are due to the fact that "run __main__" is a separate step, distinct from interpreter initialisation. When you embed Python as the scripting engine in a larger application, the idea of having a __main__ module may not actually make any sense. In those cases, it will still be there (as the interpreter creates it automatically), it will just be empty. But for the CPython CLI, we need that extra "run main" step. (There's a strong case to be made for that being in a separate PEP, and I may still do that - this discussion is certainly pushing me in that direction)
msg226186 - (view) Author: Gregory Salvan (Gregory.Salvan) Date: 2014-08-31 16:42
I didn't dare to share this but in case... just few days after my message I fall on the inspiring work of Dr. Hans Vandierendonck (presented during the 2nd International Summer School on Advances in Programming Languages in Edinburgh).

Certainly too much, but he open sourced a scheduler (C++11) with interesting ressources and a minimal example of a staged bootstrap. 
It's clearly oriented parallel computing, but the way to declare dependencies and the idea of versionned object is quite good.

All materials are dipsonible here: http://www.macs.hw.ac.uk/~dsg/events/ISS-AiPL-2014/materials/Vandierendonck/
msg226196 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-09-01 00:24
Thanks for the reference!
msg231157 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-11-14 11:28
Issue 22869 now covers the preparatory refactoring to split pythonrun into two modules.
msg267191 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-06-04 00:06
I merged the current 3.6 dev branch into the BitBucket PEP 432 branch in my CPython sandbox. The attached patch is the diff between that branch and CPython default.

All of the proposed changes here should only affect private APIs now, allowing this to be handled as a private refactoring with settings being migrated incrementally, rather than building up a large hard to maintain pending patch.

Donald Stufft raised the prospect of potentially using these changes to create a nicer single-file executable builder that provides an alternate binary that just runs itself - statically linking needed extension modules and then prepending the resulting binary to a zip archive with a __main__.py file should support quite a few scenarios.

I've also added Thomas Wouters to the nosy list, since Google have been looking at a range of options related to CPython startup and single file executables, which should provide a valuable perspective on the utility of these changes.
msg267357 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-06-05 01:03
Looking at the potential impact of being able to use C99 initializers for the main configuration structs, I realised those could be a *lot* easier to work with if they consisted entirely of pointers to Python objects:

* NULL initialisation would correctly indicate "not set" for each value
* Running Py_XDECREF on every field will correctly release memory
* Boolean toggles are clearly separated from multi-value integers (PyBool vs PyLong)
* No conversion is needed to provide a read-only view of the config data at the Python level

This wouldn't be feasible for CoreConfig (since that is populated before Python object creation is permitted), but should work for MainInterpreterConfig and InterpreterConfig.
msg275330 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-09-09 16:16
It may be worth refactoring the patch relative to the new C99 support.
msg294219 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-23 04:36
New changeset 6b4be195cd8868b76eb6fbe166acc39beee8ce36 by Eric Snow in branch 'master':
bpo-22257: Small changes for PEP 432. (#1728)
https://github.com/python/cpython/commit/6b4be195cd8868b76eb6fbe166acc39beee8ce36
msg294229 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-23 05:34
Why Lib/test/coding20731.py was changed?
msg294260 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-23 16:18
The change in Lib/test/coding20731.py was the result of running PCbuild/fix_encoding.py.
msg294274 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-05-23 18:42
This change should be reverted. Lib/test/coding20731.py intentionally should have the CRLF lines separator.
msg294279 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-23 19:26
New changeset e0918ecf93a458d4e005650f816d64654e73fc2a by Eric Snow in branch 'master':
bpo-22257: Fix CLI by using int instead of char (compares to EOF). (#1765)
https://github.com/python/cpython/commit/e0918ecf93a458d4e005650f816d64654e73fc2a
msg294288 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-23 21:42
reverting the change to that test:  https://github.com/python/cpython/pull/1770
msg294319 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-24 04:46
New changeset 1abcf6700b4da6207fe859de40c6c1bada6b4fec by Eric Snow in branch 'master':
bpo-22257: Private C-API for core runtime initialization (PEP 432). (#1772)
https://github.com/python/cpython/commit/1abcf6700b4da6207fe859de40c6c1bada6b4fec
msg294328 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2017-05-24 06:00
New changeset c7ec9985bbdbb2b073f2c37febd18268817da29a by Eric Snow in branch 'master':
bpo-22257: Private C-API for main interpreter initialization (PEP 432). (#1729)
https://github.com/python/cpython/commit/c7ec9985bbdbb2b073f2c37febd18268817da29a
msg295407 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-06-08 11:13
New changeset ab1cb80b435a34e4f908c97cd2f3a7fe8add6505 by Victor Stinner (Stéphane Wirtel) in branch 'master':
bpo-30547: Fix multiple reference leaks (#1995)
https://github.com/python/cpython/commit/ab1cb80b435a34e4f908c97cd2f3a7fe8add6505
msg305583 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-05 04:37
Bug report (since resolved) that highlighted our general lack of test coverage for the interactions between environment variable based configuration and command line based configuration: https://bugs.python.org/issue31845

This work revealed the absence, since the refactoring changes the order in which we check environment variables and the command line (we used to check the command line first, and then env vars later during Py_Initialize, now we check env vars in _Py_InitializeCore, and the command line afterwards).

Something I'd also forgotten is that I'd switched the PEP to use "_Py_InitializeRuntime" and "_Py_ConfigureMainInterpreter", but the draft implementation is currently still using "_Py_InitializeCore" and "_Py_InitializeMainInterpreter".

For that last point, it's probably easier to change the PEP back than it is to tinker with the implementation - those specific API names are pretty arbitrary anyway.
msg305585 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-11-05 04:58
New changeset 1b46131ae423f43d45947bb48844cf82f6fd82b8 by Nick Coghlan in branch 'master':
bpo-22257: Mention startup refactoring in What's New (GH-4286)
https://github.com/python/cpython/commit/1b46131ae423f43d45947bb48844cf82f6fd82b8
History
Date User Action Args
2017-11-05 04:58:48ncoghlansetmessages: + msg305585
2017-11-05 04:49:12ncoghlansetpull_requests: + pull_request4248
2017-11-05 04:37:09ncoghlansetmessages: + msg305583
2017-06-08 11:13:22vstinnersetnosy: + vstinner
messages: + msg295407
2017-06-08 10:16:46matrixisesetpull_requests: + pull_request2061
2017-05-25 17:05:40eric.snowsetpull_requests: + pull_request1903
2017-05-24 06:00:54eric.snowsetmessages: + msg294328
2017-05-24 04:46:53eric.snowsetmessages: + msg294319
2017-05-23 22:00:23eric.snowsetpull_requests: + pull_request1853
2017-05-23 21:42:59eric.snowsetmessages: + msg294288
2017-05-23 21:41:56eric.snowsetpull_requests: + pull_request1852
2017-05-23 19:26:19eric.snowsetmessages: + msg294279
2017-05-23 18:42:47serhiy.storchakasetmessages: + msg294274
2017-05-23 17:36:37eric.snowsetstatus: pending -> open
pull_requests: + pull_request1846
2017-05-23 16:18:02eric.snowsetstatus: open -> pending

messages: + msg294260
2017-05-23 05:38:00serhiy.storchakasetstage: patch review
type: enhancement
versions: + Python 3.7, - Python 3.5
2017-05-23 05:34:19serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg294229
2017-05-23 04:36:05eric.snowsetmessages: + msg294219
2017-05-22 22:24:46eric.snowsetpull_requests: + pull_request1819
2017-05-22 22:16:43eric.snowsetpull_requests: + pull_request1818
2016-09-09 16:16:17eric.snowsetnosy: + eric.snow
messages: + msg275330
2016-06-05 01:03:10ncoghlansetmessages: + msg267357
2016-06-04 00:06:24ncoghlansetfiles: + pep432-startup-redesign-skeleton.diff

nosy: + twouters, dstufft
messages: + msg267191

keywords: + patch
2014-11-15 00:16:29Arfreversetnosy: + Arfrever
2014-11-14 11:28:15ncoghlansetdependencies: + Split pylifecycle.c out from pythonrun.c
messages: + msg231157
2014-09-07 21:11:47jceasetnosy: + jcea
2014-09-01 00:24:38ncoghlansetmessages: + msg226196
2014-08-31 16:42:59Gregory.Salvansetmessages: + msg226186
2014-08-31 03:08:47ncoghlansetmessages: + msg226158
2014-08-31 02:58:00ncoghlansetmessages: + msg226156
2014-08-29 18:43:00terry.reedysetnosy: + terry.reedy
messages: + msg226069
2014-08-28 07:58:48Drekinsetnosy: + Drekin
2014-08-27 10:41:18ncoghlansetmessages: + msg225965
2014-08-25 17:30:42Gregory.Salvansetnosy: + Gregory.Salvan
messages: + msg225880
2014-08-24 08:50:41sbspidersetnosy: + sbspider
2014-08-24 00:17:23ncoghlansetdependencies: + Move main functions to a separate Programs directory
messages: + msg225773
2014-08-24 00:12:19ncoghlansetassignee: ncoghlan
messages: + msg225772
2014-08-23 11:45:47barrysetnosy: + barry
2014-08-23 10:24:48ncoghlanlinkissue22213 dependencies
2014-08-23 10:22:31ncoghlancreate