Title: Eliminate PyInterpreterState.modules.
Type: Stage: patch review
Components: Interpreter Core Versions: Python 3.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, grahamd, ncoghlan
Priority: normal Keywords: patch

Created on 2016-10-10 22:33 by eric.snow, last changed 2017-05-17 21:16 by eric.snow.

File name Uploaded Description Edit
drop-interp-modules.diff eric.snow, 2016-10-10 22:33 review
Pull Requests
URL Status Linked Edit
PR 1638 open eric.snow, 2017-05-17 21:16
Messages (8)
msg278445 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-10-10 22:33
tl;dr PyInterpreterState does not need a "modules" field.  Attached is a patch that removes it.

During interpreter startup [1] the sys module is imported using the same C API [2] as any other builtin module.  That API only requires one bit of import state, sys.modules.  Obviously, while the sys module is being imported, sys.modules does not exist yet.  To accommodate this situation, PyInterpreterState has a "modules" field.  The problem is that PyInterpreterState.modules remains significant in the C-API long after it is actually needed during startup.  This creates the potential for sys.modules and PyInterpreterState.modules to be out of sync.

Currently I'm working on an encapsulation of the import state.  PyInterpreterState.modules complicates the scene enough that I'd like to see it go away.  The attached patch does so by adding private C-API functions that take a modules arg, rather than getting it from the interpreter state.  These are used during interpreter startup, rendering PyInterpreterState.modules unnecessary and allowing sys.modules to become the single source of truth.

If this patch lands, we can close issue12633.

[1] see _Py_InitializeEx_Private() and Py_NewInterpreter() in Python/pylifecycle.c
[2] see PyModule_Create2() and _PyImport_FindBuiltin() in Python/import.c
msg278456 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-11 04:06
(added Graham Dumpleton to the nosy list to ask if this change may impact mod_wsgi for 3.7)

+1 on the general idea, but given that the current field is a public part of the interpreter state, the replacement access API should really be public as well - we can't be sure folks will always be going through the "PyImport_GetModuleDict()" API.

If you replace the addition of the `_PyImport_GetModuleDict` API with a public `PyInterpreterState_GetModuleDict` API, I think that will cover it - the new calls would just be "PyInterpreterState_GetModuleCache(tstate->interp)" rather than `_PyImport_GetModuleDict(tstate)`

Folks accessing this field directly can then define their own shim function if PyInterpreterState_GetModuleCache isn't defined.

(The rationale for the GetModuleDict -> GetModuleCache change is that "ModuleDict" is ambiguous - every module has a dict. For PyImport_* we're stuck with it, but the "PyImport" prefix at least gives a hint that the reference might be to the sys.modules cache. That affordance doesn't exist for the "PyInterpeterState" prefix.
msg278457 - (view) Author: Graham Dumpleton (grahamd) Date: 2016-10-11 04:34
I always use PyImport_GetModuleDict(). So long as that isn't going away I should be good.
msg278507 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-10-11 22:44
What's the benefit to adding PyInterpreterState_GetModuleCache()?  TBH, it should only be needed in this short period during startup when the import system hasn't been bootstrapped yet.  After that code can import sys and access sys.modules from there.  (For that matter, PyImport_GetModuleDict() doesn't buy all that much either...)  I think all this would be clearer in a world with PEP 432.  :)

FWIW, I'm inclined to encourage new APIs where it makes sense that take an explicit interp arg.  I just don't think a new public API is warranted here.  If we didn't already have PyImport_GetModuleDict(), I'd probably just move the code to Python/pylifecycle.c, inlined or in a small static function.

And +1 to ModuleCache vs. ModuleDict. :)
msg278508 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-10-11 22:57
Hmm, actually _PyImport_GetModuleDict() isn't needed to solve the startup issue.  It's still rather internally focused but the same could be said for PyImport_GetModuleDict().  I guess I'm still not sold on adding a new public API function for what amounts to a rename of PyImport_GetModuleDict().

Furthermore, wouldn't it make more sense to keep it in the PyImport_* namespace?  Perhaps we could set the precedent that explicitly-arg'ed functions should be in the PyInterpreterState_* (or PyInterpreter_*) namespace?
msg278509 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2016-10-11 23:05
Meh, there really isn't any need for _PyImport_GetModuleDict().  I'll drop it.  Problem solved! :)
msg278514 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-10-12 03:39
I just checked the docs, and it turns out I'm wrong about this being a previously public API: "There are no public members in this structure."


That means the only externally supported API that needs to be preserved is PyImport_GetModuleDict() to get the current thread's module cache, and your original patch already did that.
msg278664 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-10-14 20:04
As Nick pointed out, PyInterpreterState's fields are private so do what you want. :)
Date User Action Args
2017-05-17 21:16:44eric.snowsetpull_requests: + pull_request1732
2016-10-14 20:04:25brett.cannonsetmessages: + msg278664
2016-10-12 03:39:44ncoghlansetmessages: + msg278514
2016-10-11 23:05:25eric.snowsetmessages: + msg278509
2016-10-11 22:57:23eric.snowsetmessages: + msg278508
2016-10-11 22:44:47eric.snowsetmessages: + msg278507
2016-10-11 04:34:18grahamdsetmessages: + msg278457
2016-10-11 04:06:06ncoghlansetnosy: + grahamd
messages: + msg278456
2016-10-10 22:33:03eric.snowcreate