classification
Title: provide __file__ to extension init function
Type: enhancement Stage: resolved
Components: Extension Modules, Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Arfrever, amaury.forgeotdarc, eric.snow, jcea, loewis, scoder
Priority: normal Keywords: patch

Created on 2011-11-18 15:46 by scoder, last changed 2017-07-25 06:13 by scoder. This issue is now closed.

Files
File name Uploaded Description Edit
ext_module_init_file_path.patch scoder, 2011-11-18 16:17 Patch that implements a protocol between dynlib loader and module creator, similar to the existing _Py_PackageContext review
ext_module_reinit_context.patch scoder, 2011-11-18 16:22 Patch that implements both protocols also for the module reinit case
ext_module_init_file_path_2.patch scoder, 2011-11-19 08:29 Updated patch that does not reset _Py_ModuleImportContext after use review
Messages (15)
msg147881 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-18 15:46
In Python modules, the top-level module code sees the __file__ variable and can use it to refer to resources in package subdirectories, for example. This is not currently possible in extension modules, because __file__ is only set after running the module init function, and the module has no way to find out its runtime location.

CPython should set __file__ directly in PyModule_Create2(), based on information provided by the shared library loader. This would let PyModule_GetFilenameObject() work immediately with the newly created module object.

The relevant python-dev thread is here:

http://mail.python.org/pipermail/python-dev/2011-November/114476.html

A patch will follow soon.
msg147888 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-18 16:22
Here is an extension to the patch that implements the protocol also for extension module reinitialisation, so that the module creation can also set __file__ and the proper package in that case. Currently without tests (and users, I guess).
msg147905 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-18 17:12
I suppose that the value of _Py_ModuleImportContext is protected by the import lock?
msg147909 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-18 17:23
I don't know how the import lock applies here. Would it have to be protected by it? The lifetime is restricted to the call of the extension module init function, and its value is saved recursively if the init function triggers further imports.

It works exactly like the existing _Py_PackageContext variable.
msg147910 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-18 17:29
But the GIL can be released in many places (e.g. a Py_DECREF), and another thread can enter the same function and update the same static variable.
msg147920 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-18 20:00
... and the module init function could create and register a different module first, and ...

Well, yes, it's a best effort thing. It's rather unlikely that the GIL would get released in between the call to the init function and the creation of the module within that function, but sure, I don't see a reason why it could not happen.

However, it can't happen in moduleobject.c between the NULL check and the setting of the __file__ attribute, so that is safe enough to not trigger crashes.

And even if the wrong __file__ value is set during the run of the init function, it will still get overwritten (and thus fixed) afterwards. So my intuition is that code that relies on this new feature will simply have to make sure the module object creation is the first thing it does, and code that does not rely on it, well, does not rely on it.
msg147922 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-11-18 20:31
> ... and the module init function could create and register a
> different module first, and ...
Actually, this *does* happen, the PIL module is written this way.
And I don't agree with the "best effort" argument.  If there is a slight chance that this does not work, we'll have to fix it sooner or later.

But please check this import lock thing: if _PyImport_AcquireLock and _PyImport_ReleaseLock are always called around extension module initialization, then we don't have to worry about other threads; and nested calls are already handled by the "oldimportcontext" variable in your patch.
msg147928 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-19 08:23
I'm aware that these things happen, that's why I said it. Actually, wouldn't it rather be *correct* for __file__ to be set to the same file path for all modules that an extension module creates in its init function? That would suggest that _Py_ModuleImportContext shouldn't be set to NULL after the first assignment, but instead stay alive until it gets reset by the dynlib loader. If the loader gets invoked recursively later on, it will do the right thing by storing away the old value during the import and restoring it afterwards. So _Py_ModuleImportContext would always point to the path that contains the init function that is currently being executed.

Regarding the lock (which, I assume, is simply reentrant), it's being acquired far up when the import mechanism starts, so the dynlib loader and the init function call are protected.

Note that this does not apply to the reinit case. _PyImport_FindExtensionObject() does not acquire the lock itself (which seems correct), and it can be called directly from imp.init_builtin(), i.e. from user code. Maybe that's why the _Py_PackageContext protocol was not implemented there. That's rather unfortunate, though. I guess the reasoning is that new code that uses this new feature is expected to actually be reentrant, also in parallel, because the module it creates and works on is local to the current thread until the init function terminates. So the import lock is not strictly required here. This does complicate the __file__ feature, though, so the second ("reinit") patch won't work as is. I think the right fix for Python 4 would be to simply pass a context struct into the module init function.

On a related note, I just stumbled over this code in _PyImport_FindExtensionObject():

    else {
        if (def->m_base.m_init == NULL)
            return NULL;
        mod = def->m_base.m_init();
        if (mod == NULL)
            return NULL;
        PyDict_SetItem(PyImport_GetModuleDict(), name, mod);
        Py_DECREF(mod);
    }
    if (_PyState_AddModule(mod, def) < 0) {
        PyDict_DelItem(PyImport_GetModuleDict(), name);
        Py_DECREF(mod);
        return NULL;
    }

If PyDict_SetItem() fails, this is bound to crash. I think it would be worth looking into this mechanism a bit more.
msg147929 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-19 08:29
Updated patch that does not reset _Py_ModuleImportContext after use.
msg147932 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-19 09:18
Replying to myself:
> I think the right fix for Python 4 would be to simply pass
> a context struct into the module init function.

Actually, this doesn't have to wait for Python 4. Changing the module init function to take a parameter should be backwards compatible in C. Existing code simply wouldn't read the value from the stack, and new (or updated) code could benefit from the feature, Cython code in particular.

Here is a follow-up ticket for this more general feature:

http://bugs.python.org/issue13431
msg148002 - (view) Author: Stefan Behnel (scoder) * Date: 2011-11-20 16:42
As MvL noted in his response to issue 13431, simply adding a parameter to the module init function cannot safely be done before Python 4. So we are back to the idea of passing the information through to the module creation function, i.e. this very issue.

A variant of the implementation would be to store the context information in thread local storage instead of a global variable. That would work around any threading issues. However, this would not be required in the normal import case, only in the reinit case, as the import case is protected by the import lock, as we have seen. Personally, I do not consider this a good idea for the time being, since I doubt that the number of users for the reinitialisation API is currently worth caring about.

In any case, the semantics of __file__ for extension modules would basically become that __file__ refers to the last library that was loaded before calling the module init function. So all extension modules that this init function creates will inherit the same __file__. My guess is that they currently end up with no __file__ attribute at all, as the loader only sets it on the module that the init function returns. So I consider that an improvement already.
msg150247 - (view) Author: Stefan Behnel (scoder) * Date: 2011-12-25 10:25
Any comments on the last patch?
msg175138 - (view) Author: Stefan Behnel (scoder) * Date: 2012-11-08 07:55
I'm increasing the target version as this didn't change anything for 3.3. However, for 3.4, it might be possible to work around this by splitting the module init function into two parts, one that gets executed in order to create the module object (and do safe C level setup etc.) and one that gets executed after properly registering the module and setting __file__ and friends externally and that can then populate the module dict etc.

I can see two ways how this can be done. The shared library could optionally export a second symbol ("PyInit2_modulename" or whatever), or it could set a sort of callback function on the module object it returns in its original init function, either in the module dict or using a C level interface. I prefer the latter because it's safer - code can't accidentally rely on this feature in other Python versions (or implementations) because it would fail to build there if the required C-API function isn't available.

A corner case that this wouldn't fix is the creation of more than one module object in the module init function, but I think it's a reasonable restriction that users should be on their own here and should properly set up these additional modules themselves. Alternatively, this could be handled (with some overhead) if the import machinery remembered all modules created during the execution of the module init function and post-processed them as noted above.

So, my proposal would be to add a new callback function field to module objects and a new C-API function "PyModule_SetInitFunction(module, c_func)" that registers a C callback function on a module object. Then, extend the shared library importing code to call that init function if it's available, passing it both the module instance and an extensible context struct in order to enable future extensions to this interface.

I realise that this might have to go through a PEP process, but do you see any obvious flaws I missed so far or reasons to reject this approach right away?
msg175187 - (view) Author: Stefan Behnel (scoder) * Date: 2012-11-08 19:23
Triggered discussion on python-dev:

http://thread.gmane.org/gmane.comp.python.devel/135764
msg299041 - (view) Author: Stefan Behnel (scoder) * Date: 2017-07-25 06:13
This has been resolved by PEP 489, issue 24268.
The module initialisation process receives the complete ModuleSpec now, starting with CPython 3.5, and can do with it whatever it likes, before executing any user code.
History
Date User Action Args
2017-07-25 06:13:22scodersetstatus: open -> closed
versions: + Python 3.5, - Python 3.4
messages: + msg299041

resolution: duplicate
stage: resolved
2012-11-08 19:23:59scodersetmessages: + msg175187
2012-11-08 07:55:07scodersetmessages: + msg175138
versions: + Python 3.4, - Python 3.3
2011-12-25 10:25:54scodersetmessages: + msg150247
2011-11-20 16:42:10scodersetmessages: + msg148002
2011-11-20 05:45:08jceasetnosy: + jcea
2011-11-19 09:18:32scodersetmessages: + msg147932
2011-11-19 08:29:39scodersetfiles: + ext_module_init_file_path_2.patch

messages: + msg147929
2011-11-19 08:23:06scodersetmessages: + msg147928
2011-11-18 21:59:05Arfreversetnosy: + Arfrever
2011-11-18 20:55:30eric.snowsetnosy: + eric.snow
2011-11-18 20:31:11amaury.forgeotdarcsetmessages: + msg147922
2011-11-18 20:00:09scodersetmessages: + msg147920
2011-11-18 17:29:21amaury.forgeotdarcsetmessages: + msg147910
2011-11-18 17:23:20scodersetmessages: + msg147909
2011-11-18 17:12:40amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg147905
2011-11-18 16:22:22scodersetfiles: + ext_module_reinit_context.patch

messages: + msg147888
2011-11-18 16:17:40scodersetfiles: + ext_module_init_file_path.patch
nosy: + loewis
keywords: + patch
2011-11-18 15:46:57scodercreate