classification
Title: Add a keyword-only spec argument to types.ModuleType
Type: Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: Arfrever, aronacher, brett.cannon, eric.snow, ncoghlan, python-dev, rhettinger
Priority: low Keywords: patch

Created on 2014-01-24 17:11 by brett.cannon, last changed 2014-05-31 01:24 by ncoghlan. This issue is now closed.

Files
File name Uploaded Description Edit
module_from_spec.diff brett.cannon, 2014-05-18 22:32 review
Messages (28)
msg209100 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-01-24 17:11
Would allow for the name attribute to be optional since it can be grabbed from the spec. Since having module.__spec__ set is now expected we should help ensure that by supporting it in the constructor.
msg215554 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-04-04 19:09
I envision making this happen would also allow for importlib to be updated to rely on __spec__ when possible, with the idea that sometime in the future we can deprecate pulling attributes from a module directly and shift to always working from __spec__.
msg215629 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-05 20:48
Sounds good to me.
msg215648 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-04-06 05:06
> Would allow for the name attribute to be optional
  ...
>  with the idea that sometime in the future we can
> deprecate pulling attributes from a module directly

Is there any chance that this would ever happen?
The __name__ attribute has been around almost forever
is widely relied on.  Deprecating it would be a very
disruptive event for nearly zero benefit.
msg215649 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-04-06 05:20
I made roughly the same point in the current import-sig thread that relates here:

https://mail.python.org/pipermail/import-sig/2014-April/000805.html

Basically, I agree we should be careful with both __name__ and __file__.
msg215653 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-04-06 10:26
No, the attribute level arguments won't go away - __name__ deliberately differs from __spec__.name in some cases (notably in __main__), __path__ may be manipulated after the module is loaded, and __name and __file__ are both used too heavily within module code for it to be worth the hassle of deprecating them in favour of something else.

I think Brett's push to simplify things as much as possible is good though - that's the main brake on creeping API complexity in the overall import system as we try to make the internals easier to comprehend and manipulate.
msg215662 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-04-06 16:49
I can dream about getting rid of the attributes, but I doubt it would happen any time soon, if at all. But we do need to make it easier to set __spec__ on a new module than it currently is to help promote its use.
msg217925 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-05 13:54
My current thinking on this it to introduce in importlib.util:

  def module_from_spec(spec, module=None):
    """Create/initialize a module based on the provided spec.

    If a module is provided then spec.loader.create_module()
    will not be consulted.
    """

This serves two purposes. One is that it abstracts the loader.create_module() dance out so that's no longer a worry. But more crucially it also means that if you have the function create the module for you then it will be returned with all of its attributes set without having to worry about forgetting that step. The module argument is just for convenience in those instances where you truly only want to override the module creation dance for some reason and really just want the attribute setting bit.
msg218759 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-18 22:32
Here is an implementation of importlib.util.module_from_spec(). With this it makes PEP 451 conceptually work like:

  spec = importlib.util.find_spec(name)
  module = importlib.util.module_from_spec(spec)
  spec.loader.exec_module(module)

About the only other thing I can think of that people might still want is something like `importlib.util.load(spec)` so that they don't even need to care about whether load_module() or exec_module() is defined, but that can be a separate issue if it turns out people actually want something like that.
msg218981 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-23 18:00
@Brett: Did those last two messages (and the patch) get on the wrong issue.  The issue of module_from_spec() or the like seems somewhat orthogonal to a spec argument to ModuleType.  Perhaps you meant issue #21436 or #21235?

Otherwise, I have a few comments. :)
msg219043 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-24 14:05
Nope, I commented where I meant to. I wanted a way to promote people to **always** create modules with properly initialized attributes while also dealing with the module creation dance at the same time. Otherwise it will require expanding the API of types.ModuleType() to accommodate specs while also needing to expose a function to do the proper thing to get a module for a loader and **still** have a function to set up a module properly based on what was potentially returned from create_module(). Rolling it all into a single function that just gets you a module ready for use seems like the most practical solution from an API perspective.
msg219046 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-05-24 14:28
I'd ask "Why not a class method?", but I already know the answer (types.ModuleType is implemented in C, so it would be unnecessarily painful to implement it that way).

Given that, the utility function approach sounds good to me.
msg219135 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-26 06:03
Okay, I didn't read closely enough. :)  It may be worth updating the title.

FWIW, the name "module_from_spec" confused me at first because my brain interpreted that as "load_from_spec".  Keeping the name and purpose more focused might be helpful.  I have comments below to that effect.

If your proposed change still makes sense, could we keep it simpler for now?  Something like this:

# in importlib.util
def module_from_spec(spec, module=None):
    """..."""
    methods = _bootstrap._SpecMethods(spec)
    if module is None:
        return methods.create()
    else:
        methods.init_module_attrs(methods)
        return module

Keeping the 2 methods on _SpecMethods is helpful for the PEP 406 (ImportEngine) superseder that I still want to get back to for 3.5. :)

----------------------------------

> This serves two purposes. One is that it abstracts the
> loader.create_module() dance out so that's no longer a worry.

I'm not sure what dance you mean here and what the worry is.

> But more crucially it also means that if you have the function
> create the module for you then it will be returned with all of
> its attributes set without having to worry about forgetting that
> step.

So we would discourage calling ModuleType directly and encourage the use of a function in importlib.util that is equivalent to _SpecMethods.create().  That sounds good to me.  The use case for creating module objects directly is a pretty advanced one, but having a public API for that would still be good.

From this point of view I'd expect it to just look like this:

def new_module(spec):
    return _SpecMethods(spec).create()

or given what I expect is the common use case currently:

def new_module(name, loader):
    spec = spec_from_loader(name, loader)
    return _SpecMethods(spec).create()

or together:

def new_module(spec_or_name, /, loader=None):
    if isinstance(spec_or_name, str):
        name = spec_or_name
        if loader is None:
            raise TypeError('missing loader')
        spec = spec_from_loader(name, loader)
    else:
        if loader is not None:
            raise TypeError('got unexpected keyword argument "loader"')
        spec = spec_or_name
    return _SpecMethods(spec).create()

To kill 2 birds with 1 stone, you could even make that the new signature of ModuleType(), which would just do the equivalent under the hood.  That way people keep using the same API that they already are (no need to communicate a new one to them), but they still get the appropriate attributes set properly.

> The module argument is just for convenience in those
> instances where you truly only want to override the module
> creation dance for some reason and really just want the
> attribute setting bit.

Perhaps it would be better to have a separate function for that (equivalent to just _SpecMethods.init_module_attrs()).  However, isn't that an even more uncommon use case outside of the import system itself?

> About the only other thing I can think of that people might
> still want is something like `importlib.util.load(spec)`

I agree it's somewhat orthogonal.  I'll address comments to issue #21235.
msg219156 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-26 14:37
First, about breaking up _SpecMethods: that was entirely on purpose. =) I honestly have found _SpecMethods a bit of a pain to work with because at every place where I have a spec object and I need to operate on it I end up having to wrap it and then call a method on it instead of simply calling a function (it also doesn't help that spec methods is structured as a collection of methods which can just operate as functions since they almost all have ``spec = self.spec`` at the beginning). I get you're hoping to make PEP 406 happen, but it actually needs to happen first. =) And as I said, the methods are essentially functions anyway so undoing any unraveling I may do won't be difficult to revert if PEP 406 actually comes about. IOW _SpecMethods feels like OOP just 'cause and not for any design reasons; it's been making my eye twitch when I look at that class.

Second, the dance that an advanced user has to worry about is "does create_module exist, and if so did it not return None and if any of that is not true then return what types.ModuleType would give you" is not exactly one line of code ATM. There is no reason to not abstract that out to do the right thing in the face of a loader.

Third, yes this would be to encourage people not to directly call types.ModuleType but call the utility function instead.

Fourth, I purposefully didn't bifurcate the code of types.ModuleType based on the type of what the first argument was. At best you could change it to take an optional spec as a second argument and use that, but if you did that and the name doesn't match the spec then what? I'm not going to promote passing in a loader because spec_from_loader() cannot necessarily glean all of the same information a hand-crafted spec could if the loader doesn't have every possible introspection method defined (I'm calling "explicit is better than implicit" to back that up). I also don't think any type should depend on importlib having been previously loaded to properly function if it isn't strictly required, so the code would have to be written entirely in C which I just don't want to write. =) Since it's going beyond simply constructing a new module but also initializing it I think it's fine to keeping it in importlib.util which also makes it all more discoverable for people than having to realize that types.ModuleType is the place to go to create a module and get its attributes initialized.

Fifth, fair enough on not needing the module argument. I will refactor the code for internal use of attribute initialization in testing and leave it at that.
msg219168 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-26 18:29
tl;dr I'm okay with pulling the functions out of _SpecMethods (and
dropping the class) but I'd prefer the distinct functions continue to
exist as they are.  Also, I still think updating the ModuleType
signature is the way to go (given current use cases). :)

> First, about breaking up _SpecMethods: that was entirely on purpose. =) ... IOW _SpecMethods feels like OOP just 'cause and not for any design reasons; it's been making my eye twitch when I look at that class.

Keep in mind that _SpecMethods is basically a proxy for ModuleSpec
that has the methods that ModuleSpec had originally.  That was the
simplest way to not expose the methods on every module.__spec__.  If
we didn't care about exposing more API on __spec__, we'd have it all
on ModuleSpec.  I still wonder if we could have used subclassing
instead of proxying (obviating the pain points we've run into) and
used a SimpleModuleSpec for __spec__ and ModuleSpec for all other APIs
(or ModuleSpec/FullModuleSpec):

  SimpleModuleSpec  # What currently is ModuleSpec.
  ModuleSpec(SimpleModuleSpec)  # Basically this is _SpecMethods.

Regardless,  _SpecMethods made sense at the time and I still find it
helpful to have the functions grouped into a distinct namespace.  That
said, I agree that it's a pain. :)  I'm okay with ditching the class,
but would rather we kept the distinct functions that exist now.

> Second, the dance that an advanced user has to worry about is "does create_module exist, and if so did it not return None and if any of that is not true then return what types.ModuleType would give you" is not exactly one line of code ATM. There is no reason to not abstract that out to do the right thing in the face of a loader.

That's what _SpecMethods.create() already does for you.

> Third, yes this would be to encourage people not to directly call types.ModuleType but call the utility function instead.

I'm totally on board. :)

> Fourth, I purposefully didn't bifurcate the code of types.ModuleType based on the type of what the first argument was. At best you could change it to take an optional spec as a second argument and use that, but if you did that and the name doesn't match the spec then what?

I see your point.  I just see the trade-offs a little differently. :)
This is all about finding the best way of eliminating the problem of
people not setting the module attributes properly (and making it
easier to do so).  From my perspective:

* Currently every module must have a spec (and consequently a loader).
* Updating the ModuleType signature will encourage setting the spec,
etc. better than a separate factory would.

If every module must have a spec, then I'd expect that to be part of
the ModuleType signature.  I can see the use case (though uncommon)
for directly creating a module object.  Is there a use case for
creating a module without a spec?  Either way, if it's important
enough to ensure that people set the module attrs properly, then we
should consider having direct instantiation of ModuleType() issue a
warning when created without setting the spec, etc.

Regarding the second point, with a separate factory function, people
will have to learn about the function and then switch to using it
before they get the benefit.

Backward compatibility for an updated signature shouldn't be too hard:

  currently: ModuleType(name, doc=None)
  updated: ModuleType(name_or_spec, doc=None, *, loader=None)
  actual: ModuleType(name_or_spec=None, /, name=None, doc=None, *, loader=None)

> I'm not going to promote passing in a loader because spec_from_loader() cannot necessarily glean all of the same information a hand-crafted spec could if the loader doesn't have every possible introspection method defined (I'm calling "explicit is better than implicit" to back that up).

Regardless of new signature or new factory, I still think the
signature should allow for passing in a spec or passing in a
name/loader pair.  Passing name/loader instead of spec is a
convenience for what I anticipate is the most common use case now:
given a loader, create a new module.  For comparison:

  mod = new_module(name, loader=loader)

vs.

  spec = spec_from_loader(name, loader=loader)
  mod = new_module(spec)

I'll argue that we can accommodate the common case and if that doesn't
work, they can still create their own spec.  If this were new
functionality then I'd agree with you.

> I also don't think any type should depend on importlib having been previously loaded to properly function if it isn't strictly required, so the code would have to be written entirely in C which I just don't want to write. =)

If it's in _bootstrap then it's already available.  No C required. :)

> Since it's going beyond simply constructing a new module but also initializing it I think it's fine to keeping it in importlib.util which also makes it all more discoverable for people than having to realize that types.ModuleType is the place to go to create a module and get its attributes initialized.

It seems to me that people haven't been thinking about initializing a
module's attributes.  This is popping up now because people are
starting to take their advanced importing use cases to Python 3 (which
I take as a very good sign!).  So ModuleType seems like the right
place for me. :)
msg219170 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-26 19:22
I think we view the fundamentals of built-in types differently as well. =) A module instance can exist without a spec no problem. As long as you don't pass that module instance through some chunk of code that expects __spec__ -- or any other attribute for that matter -- then the instance is fine and operates as expected. There is nothing fundamental to module objects which dictates any real attribute is set (not even __name__). If I wanted to construct a module from scratch and just wanted a blank module object which I assign attributes to manually then I can do that, and that's what types.ModuleType provides (sans its module name requirement in its constructor which really isn't necessary either).

This also means that tying the functionality of the module type to importlib is the wrong direction to have the dependency. That means I refuse to add code to the module type which requires that importlib have been imported and set up. Just think of someone who embeds Python and has no use for imports; why should the module type then be broken because of that choice? That means anything done to the module type needs to be done entirely in C.

But luckily for you bytes broke the glass ceiling of pivoting on a type's constructor based on its input type (it is the only one, though). So I'll let go on arguing that one. =)

Anyway, I'll think about changing types.ModuleType, but having to implement init_module_attrs() in pure C and then exposing it in _imp just doesn't sound like much fun.

And as for your preference that "the distinct functions continue to exist as they are", are you saying you want the code duplicated or that you just don't like me merging the create() and init_module_attrs() functions?
msg219173 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-26 19:52
I give. :)  You've made good points about builtins and C implementations.  Also, thinking about issue #21235 has changed my perspective a bit.

As to _SpecMethods, I mean just drop the class and turn the methods into functions:

* name: <name> -> _spec_<name> (or whatever)
* signature: self -> spec
* body: self.spec -> spec
* body: self.<method> -> method (there is interplay between methods)

And then importlib.util.new_module:

def new_module(spec):
    return _bootstrap._spec_create(spec)
msg219224 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-27 13:51
Why do you want a one-liner wrapper for the functions for the public API when they are exactly the same?
msg219229 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-27 16:26
You're right that it doesn't have to be a one-line wrapper or anything more than an import-from in importlib.util. :)
msg219388 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-30 14:08
Giving Eric is polymorphic first argument to types.ModuleType() is going to be tricky thanks to the fact that it is not hard-coded anywhere in the C code nor documented that the first argument must be a string (the only way it might come up is from PyModule_GetName() and even then that's not required to work as you would expect). And since we purposefully kept specs type-agnostic, you can't do a type check for a SimpleNamespace.

I think the only way for it to work is by attribute check (e.g. is 'name' or 'loader' defined on the object, regardless of value).
msg219389 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-30 14:16
Another issue with the polymorphic argument is that the module type is one of those rare things written in C with keyword parameter support, so renaming the 'name' argument to 'name_or_spec' could potentially break code.
msg219397 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-30 15:55
Yeah, it just looks too complicated to take the ModuleType signature approach, as much as I prefer it. :)  I appreciate you taking a look though.
msg219399 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-30 16:28
And another complication is the compatibility hack to set loader when submodule_search_locations is set but loader is not since _NamespaceLoader is not exposed in C without importlib which gets us back into the whole question of whether types should function entirely in isolation from other subsystems of Python.
msg219402 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-30 16:47
But that part is less of a concern since we don't need namespace packages before or during bootstrapping, and afterward we have access to interp->importlib.  Or am I missing something?

> which gets us back into the whole question of whether types should
> function entirely in isolation from other subsystems of Python.

That is a great question, regardless.  In part I imagine it depends on the subsystem and how intrinsic it is to the interpreter, which seems like a relatively implementation-agnostic point from a high-level view.
msg219411 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-30 18:55
New changeset b26d021081d2 by Brett Cannon in branch 'default':
Issue #20383: Introduce importlib.util.module_from_spec().
http://hg.python.org/cpython/rev/b26d021081d2
msg219412 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-05-30 18:57
After all the various revelations today about how much of a hassle and murky it would be to get types.ModuleType to do what we were after, I went ahead and kept importlib.util.module_from_spec(), but dropped the module argument bit. I also deconstructed _SpecMethods along the way while keeping the abstractions as Eric requested.
msg219418 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-05-30 19:43
Thanks for doing that Brett and for accommodating me. :)  Also, the various little cleanups are much appreciated.
msg219434 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-05-31 01:24
Another thing to keep in the back of your minds: one of my goals for PEP
432 is to make it possible to have a fully functional embedded interpreter
up and running *without* configuring the Python level import system. This
is the state the interpreter would be in between the beginning and end of
initialisation.
History
Date User Action Args
2014-05-31 01:24:37ncoghlansetmessages: + msg219434
2014-05-30 19:43:05eric.snowsetmessages: + msg219418
2014-05-30 18:57:20brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg219412

stage: commit review -> resolved
2014-05-30 18:55:40python-devsetnosy: + python-dev
messages: + msg219411
2014-05-30 16:47:28eric.snowsetmessages: + msg219402
2014-05-30 16:28:45brett.cannonsetmessages: + msg219399
2014-05-30 15:55:41eric.snowsetmessages: + msg219397
2014-05-30 14:16:40brett.cannonsetmessages: + msg219389
2014-05-30 14:08:06brett.cannonsetmessages: + msg219388
2014-05-27 16:26:18eric.snowsetmessages: + msg219229
2014-05-27 13:51:56brett.cannonsetmessages: + msg219224
2014-05-26 19:52:48eric.snowsetmessages: + msg219173
2014-05-26 19:22:22brett.cannonsetmessages: + msg219170
2014-05-26 18:29:15eric.snowsetmessages: + msg219168
2014-05-26 14:49:25brett.cannonlinkissue21235 dependencies
2014-05-26 14:37:02brett.cannonsetmessages: + msg219156
2014-05-26 06:03:12eric.snowsetmessages: + msg219135
2014-05-24 14:28:31ncoghlansetmessages: + msg219046
2014-05-24 14:05:45brett.cannonsetmessages: + msg219043
2014-05-23 18:00:51eric.snowsetmessages: + msg218981
2014-05-18 22:32:20brett.cannonsetfiles: + module_from_spec.diff
keywords: + patch
messages: + msg218759

stage: test needed -> commit review
2014-05-11 12:03:26Arfreversetnosy: + Arfrever
2014-05-05 13:54:14brett.cannonsetnosy: + aronacher
messages: + msg217925
2014-04-06 16:49:17brett.cannonsetmessages: + msg215662
2014-04-06 10:26:53ncoghlansetmessages: + msg215653
2014-04-06 05:20:33eric.snowsetmessages: + msg215649
2014-04-06 05:06:34rhettingersetnosy: + rhettinger
messages: + msg215648
2014-04-05 20:48:34eric.snowsetmessages: + msg215629
2014-04-04 19:09:03brett.cannonsetassignee: brett.cannon
messages: + msg215554
2014-01-24 17:11:47brett.cannoncreate