classification
Title: PEP 547: Running extension modules using -m switch
Type: enhancement Stage: patch review
Components: Extension Modules Versions: Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Dormouse759, encukou, ncoghlan, scoder, terry.reedy
Priority: normal Keywords:

Created on 2017-05-19 11:20 by Dormouse759, last changed 2017-09-09 06:41 by scoder.

Pull Requests
URL Status Linked Edit
PR 1761 open Dormouse759, 2017-05-23 13:54
Messages (21)
msg293954 - (view) Author: Marcel Plch (Dormouse759) * Date: 2017-05-19 11:20
Currently the -m switch does not work with extension modules:
    
    $ python3 -m math

    /usr/bin/python3: No code object available for math


In order to enable extension modules to behave like Python source modules,
the -m switch should be supported.

Please, see this proof of concept:
https://github.com/Traceur759/cpython/tree/main_c_modules
msg293979 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-19 21:43
What is the use case?

It only make sense to run any stdlib module with -m, and without -i, if it has a command line interface (and an if __name__ clause).  Otherwise, the module is created and then deleted when python exits.

> py -m math
>

C-coded modules with such a command line interface have a mod.py file that imports _mod.

The following can make sense.

> py -m -i math  
Python x.y ...
>>> math.sin(1.48973)

But it is hardly needed as '-m -i math' is only one char less than 'import math'
msg293983 - (view) Author: Petr Viktorin (encukou) * Date: 2017-05-19 23:17
It's part of a larger effort to bring the capabilities of extension modules up to par with Python ones. For example, it's one less surprise you'd get when you Cythonize a module. And it's not only for stdlib modules – it's for any extension module.



I'd be happy to answer questions here. But to get up to speed and avoid bpo comment lag, you can also see the current discussion on import-sig [0], or dig up some of the older conversations there leading up to PEP 489 (Feb-May 2015). And if you're at PyCon, you could have a high-bandwidth conversation with Nick Coghlan – he has the master plan in his head :)

[0] https://mail.python.org/pipermail/import-sig/2017-May/001072.html
msg294016 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-05-20 07:22
As a high level overview of the general idea: we'd like it to be almost entirely transparent to the end user as to whether a particular module is implemented as normal Python source code, a precompiled bytecode/wordcode file, or a precompiled Cython extension module (or equivalent).

At the moment, this is pretty close to being true for source code vs precompiled bytecode/wordcode when it comes to both imports and execution as a script. The main missing piece there is to implement source code maps for generating more informative tracebacks given only the precompiled form (perhaps by borrowing JavaScript's "source map" concept)

For extension modules, the original multi-phase initialisation PEP got this pretty close to being true for the import case - things like reload() can now work much the same way they do for pure Python and pyc files if a module author (or module generation tool) cares to make it so.

However, we don't yet support the use of extension modules as scripts, neither for direct execution, nor via the `-m` switch, so it's impossible for a tool like Cython to handle that transparently - if a module exposes functionality via -m, then migrating it directly to Cython will break than behaviour.
msg294019 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2017-05-20 08:13
So a use case might be that someone could compile all the stdlib .py modules with cython, as they are, without touching the code, and have the result be a drop-in replacement.  I'd like that to be possible.
msg294032 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-05-20 14:37
Yep, that's the kind of thing we'd like to make possible.
msg294439 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-05-25 06:46
I just marked the associated PR as "[PEP required]", as while I'm in favour of merging this, I think we should go through the PEP process first:

1. We've been burned before by not properly advertising changes to Python's command line behaviour (while zip archive execution was added back in 2.6, a lot of folks didn't learn about it until the addition of the better advertised zipapp utility module in 3.5)
2. We should discuss the approach with the Cython developers and make sure that they're willing and able to support it when targeting 3.7+ before merging the implementation

Such a PEP will also provide a common answer to Terry's question above, as the "This allows Cython to be used directly on __main__ modules" benefit isn't going to be readily apparent to folks that aren't already familiar with the details of how Cython works.
msg294448 - (view) Author: Petr Viktorin (encukou) * Date: 2017-05-25 08:58
Adding Stefan from the Cython project.

Stefan, is this something you'd want to use?
(It does require PEP 489 multi-phase initialization, so I assume it would need extra #ifdefs in Cythonized code for Python 3.5+ or 3.7+)
msg294454 - (view) Author: Stefan Behnel (scoder) * Date: 2017-05-25 10:18
Thanks for bringing me in. The PoC implementation looks nice. Whether I'd like to support this in Cython? Absolutely. Requires some work, though, since Cython still doesn't implement PEP 489. But it shouldn't be hard, if I remember the discussions from back then correctly. I could try to free some time in August to catch up with this. That would still fit into the pre-alpha phase of Py3.7.

Unless, obviously, others would like to give it a try in the meantime. It's mostly about splitting up the current generated module init function into separate phases. Ok, there could be some minor obstacles along the way. ;)

I created a ticket for it in our own tracker for now.
https://github.com/cython/cython/issues/1715
msg294752 - (view) Author: Petr Viktorin (encukou) * Date: 2017-05-30 12:54
This is now waiting to be added to the PEPs repo: https://github.com/python/peps/pull/266
msg294826 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-05-31 09:23
This proposal is now PEP 547: https://github.com/python/peps/commit/cd84e206f58cf929eea235fb894cff1db2a1dabf
msg298857 - (view) Author: Stefan Behnel (scoder) * Date: 2017-07-22 15:34
FYI, I've finally managed to find the time for implementing PEP 489 style module initialisation in Cython. It was so easy that I'm sorry it took me so long to get started. Cython 0.26 is fresh out, so the feature should go into 0.27.
https://github.com/cython/cython/pull/1794

Note that I ended up implementing the Py_mod_create function which is now copying attributes from the spec right after creation. While not strictly required, I guess, it turned out to be easiest, but it seems like this does now conflict with the current "-m" implementation. I asked on the github commit why that restriction exists.

One thing I stumbled over: exec_in_module() is a Python level function. Would that make it possible to re-execute an already imported module? That could be dangerous, because C level module code often initialises resources that a simple re-execution of the module exec function would overwrite without cleaning up the old state. I did not (yet?) implement support for m_clear() etc., and that might actually turn out to be really risky when it comes to supporting arbitrary user code.

OTOH, that case is easy to detect because the module is already completely initialised at that point.

As far as I understand it, all that this PEP really changes from the POV of the extension module is that it calls the exec function with a different module name ("__main__"). Cython already provides that feature itself (by embedding CPython in a C main function), so this should be easy to support.
msg301133 - (view) Author: Marcel Plch (Dormouse759) * Date: 2017-09-01 16:04
Sorry for not responding for so long, I didn't work on Python through the summer because of some other matters.

Re-execution of the module is not possible, because there is a check on the C level, If you call exec_in_module() on an already initialized module, it'll raise ImportError.

Also, because of this check, there is the restriction for py_mod_create. "Modules" defining this slot might not be module objects at all, so they might not have the module state pointer (which is used to flag if the module was initialized).
msg301457 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-06 09:00
OTOH, if the created "module" is not a module object, then we could argue that the extension implementation is on its own with that case, and has to do its own re-execution safety checks.
msg301461 - (view) Author: Petr Viktorin (encukou) * Date: 2017-09-06 11:30
Do we have a use case for this?
I'd rather avoid making it easy to do the wrong thing, unless it's needed.
msg301467 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-06 12:45
Marcel proposed to disallow main-execution if the extension *might* return anything but a real object (not only if it actually does), but that seems excessive to me. The actual problem is that we consider it unsafe if the module is executed more than once, because it might overwrite module state. But that's entirely up to the extension implementation and independent of what it uses as module type.

Given how easy it is so create and/or depend on global state in C, I would assume that extensions have to be explicitly designed in order to be re-executable. Can't we just have another slot that explicitly marks the module as such?

What do you think of this protocol:

Before running the exec or main-exec function, the runner checks for a slot entry "Py_mod_allow_reexec" (can have value NULL). If not found, it sets the function pointers in the exec *and* main-exec slots to NULL to prevent any further (or concurrent) re-execution. If the slot function is not NULL on the next execution request, it can be called (again).

That effectively prevents any re-execution by default and provides an opt-in way for the module to allow it.
msg301585 - (view) Author: Petr Viktorin (encukou) * Date: 2017-09-07 14:20
Again, what is the use case? That's a real question, I'm not saying it to dismiss your ideas or points of view. It would be very much easier to think about a concrete use case, rather than making a general system for the sake of how easy it is implementation-wise. (The implementation might be easier now, but it might change, and there's a cost to keeping the generality in mind when designing things on top of all this.)

Something like the slot you mention can always be added later if it's needed. Is it needed now?


Also, the PyModuleDef should never be modified (beyond the one-time initialization that sets ob_type -- that's a workaround for not being always able to declare the type statically).
It should be possible to make additional, independent module instances from a PyModuleDef.
msg301587 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-07 14:31
I was kinda guessing that modifying the slot list wasn't a good idea. ;)

My current use case is that I implement the "create" slot because it makes it very easy to intercept the spec and its configuration. It is not passed into "exec" as such, but I need it to initialise the module namespace with "__file__", "__path__", etc.

There is also still the idea of defining our own module type in Cython in order to have a place where we can keep C level module globals, and also to support module properties. PEP 549 will not be available in older Python versions, even if it gets accepted.

Having to choose between main-exec support and these two features seems wrong.
msg301591 - (view) Author: Petr Viktorin (encukou) * Date: 2017-09-07 15:39
Alright, that makes sense. Thanks for the feedback!
Please give us some time for an updated proposal/implementation. I'm going on vacation, so expect about a week.
msg301696 - (view) Author: Marcel Plch (Dormouse759) * Date: 2017-09-08 13:19
I have made a patch both for cython and cpython implementing a way to use Py_mod_create in cython.

Basically module def that specifies a new "Py_mod_cython" slot are excluded from the rule of no module creation, so these modules can be executed directly even though they specify Py_mod_create.

Is this approach safe or does it make easy for things to go wrong?

cpython - https://github.com/Traceur759/cpython/commit/51a7508d176b23dcf3547b970cf7e6a50898aae2

cython - https://github.com/Traceur759/cython/commit/2ca706e10f469cd38947eecd8b92c142532b20bc
msg301762 - (view) Author: Stefan Behnel (scoder) * Date: 2017-09-09 06:41
I'm a bit torn on this. On the one hand, it's basically saying, "Cython is probably going to do it right anyway, so let's just assume it does". That's nice, and might be applicable to other cases as well. But that also feels like it could need some kind of versioning.

On the other hand, it's totally not magic to implement something similar by hand, so naming the flag in a Cython specific way feels wrong from a design perspective. Other tools might start picking it up, and that would lead to major confusion. In a way, it's both very broad and too narrow.

Basically, if we expect the flag to be used in a broader way, I'm happy to generally mark Cython modules with it. It's very explicit in *that* regard. I'm just not sure that the use case at hand is the right reason to introduce this kind of general marker.

Speaking of versioning, though, what about introducing a generic slot field instead that notes the latest CPython API version known to work with the module? (Cast pointer value to int to get the value.) That way, CPython could introduce new extension module behaviour with new C-API versions, and tools that support them can update their version value in the slot to mark them as safely supported.
History
Date User Action Args
2017-09-09 06:41:06scodersetmessages: + msg301762
2017-09-08 13:19:57Dormouse759setmessages: + msg301696
2017-09-07 15:39:21encukousetmessages: + msg301591
2017-09-07 14:31:35scodersetmessages: + msg301587
2017-09-07 14:20:47encukousetmessages: + msg301585
2017-09-06 12:45:32scodersetmessages: + msg301467
2017-09-06 11:30:26encukousetmessages: + msg301461
2017-09-06 09:00:24scodersetmessages: + msg301457
2017-09-01 16:04:09Dormouse759setmessages: + msg301133
2017-07-22 15:34:52scodersetmessages: + msg298857
2017-05-31 09:23:46ncoghlansetmessages: + msg294826
title: Running extension modules using -m switch -> PEP 547: Running extension modules using -m switch
2017-05-30 12:54:06encukousetmessages: + msg294752
2017-05-25 10:18:52scodersetmessages: + msg294454
2017-05-25 08:58:35encukousetnosy: + scoder
messages: + msg294448
2017-05-25 06:46:30ncoghlansetmessages: + msg294439
stage: patch review
2017-05-23 13:54:58Dormouse759setpull_requests: + pull_request1845
2017-05-20 14:37:48ncoghlansetmessages: + msg294032
2017-05-20 08:13:01terry.reedysetmessages: + msg294019
2017-05-20 07:22:43ncoghlansetmessages: + msg294016
2017-05-19 23:17:53encukousetmessages: + msg293983
2017-05-19 21:43:52terry.reedysetnosy: + terry.reedy
messages: + msg293979
2017-05-19 12:01:30encukousetnosy: + ncoghlan, encukou
2017-05-19 11:20:42Dormouse759create