classification
Title: update the import machinery to only use __spec__
Type: enhancement Stage: needs patch
Components: Interpreter Core Versions: Python 3.6
process
Status: open Resolution:
Dependencies: 25791 Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, ncoghlan
Priority: low Keywords:

Created on 2014-06-14 21:33 by eric.snow, last changed 2016-01-22 23:30 by brett.cannon.

Messages (11)
msg220581 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-06-14 21:33
With PEP 451, Python 3.4 introduced module specs to encapsulate the module's import-related information, particularly for loading.  While __loader__, __file__, and __cached__ are no longer used by the import machinery, in a few places it still uses __name__, __package__, and __path__.

Typically the spec and the module attrs will have the same values, so it would be a non-issue.  However, the import-related module attributes are not read-only and the consequences of changing them (i.e. accidentally or to rely on an implementation detail) are not clearly defined.  Making the spec strictly authoritative reduces the likelihood accidental changes and gives a better focus point for a module's import behavior (which was kind of the point of PEP 451 in the first place).  Furthermore, objects in sys.modules are not required to be modules.  By relying strictly on __spec__ we likewise give a more distinct target (of import-related info) for folks that need to use that trick.

I don't recall the specifics on why we didn't change those 3 attributes for PEP 451 (unintentional or for backward compatibility?).  At one point we discussed the idea that a module's spec contains the values that *were* used to load the module.  Instead, each spec became the image of how the import system sees and treats the module.

So unless there's some solid reason, I'd like to see the use of __name__, __package__, and __path__ by the import machinery eliminated (and accommodated separately if appropriate).  Consistent use of specs in the import machinery will help limit future surprises.

Here are the specific places:

__name__
--------
mod.__repr__()
ExtensionFileLoader.load_module()
importlib._bootstrap._handle_fromlist()
importlib._bootstrap._calc___package__()
importlib._bootstrap.__import__()

__package__
-----------
importlib._bootstrap._calc___package__()

__path__
--------
importlib._bootstrap._find_and_load_unlocked()
importlib._bootstrap._handle_fromlist()
importlib._bootstrap._calc___package__()

__file__
--------
mod.__repr__()

Note that I'm not suggesting the module attributes be eliminated (they are useful for informational reasons).  I would just like the import system to stop using them.  I suppose they could be turned into read-only properties, but anything like that should be addressed in a separate issue.

If we do make this change, the language reference, importlib docs, and inspect docs should be updated to clearly reflect the role of the module attributes in the import system.

I have not taken into account the impact on the standard library.  However, I expect that it will be minimal at best.  (See issue #21736 for a related discussion).
msg220597 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-14 23:49
Manipulating name, package and path at runtime is fully supported, and the
module level attributes accordingly take precedence over the initial import
time spec.

There may be some test suite gaps and documentation issues around the
behaviour, but it's definitely intentional (things like runpy,
"pseudo-modules", third party namespace package support and workarounds for
running modules inside packages correctly rely on it).
msg220606 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-06-15 02:41
Thanks for clarifying.  I remembered discussing it but couldn't recall the details.  Documenting the exact semantics, use cases, and difference between spec and module attrs would be help.  I'll look into updating the language reference when I have some time.

It would still be worth it to find a way to make __spec__ fully authoritative, but I'll have to come up with a solution to the current use cases before that could go anywhere. :)
msg220610 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-15 03:44
The spec is authoritative for "how was this imported?". The differences between that and the module attributes then provide a record of any post-import modifications.
msg258332 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-15 21:41
So I am going to disagree with Nick about the module attributes and their usefulness (partially because I just made __spec__.parent take precedence over __package__ in issue #25791). While I get the idea of wanting a history of what did (not) change since import, keeping the duplicate information around is annoying. And I don't know how truly useful it is to know what things were compared to what they became.

If we shift to preferring specs compared to module attributes we can then begin to clean up __import__ itself by no longer grabbing the globals() and locals() and instead simply pass in the module's __spec__ object. It also simplifies the documentation such that we don't have to explain everything twice. If people really want to track what a value was relating to import before mutation they can simply store it themselves instead of making us do the bookkeeping for them. It would also make things such as module_from_spec() or loader.create_module() simpler since they only have to worry about setting __spec__ instead of that attribute plus a bunch of other ones.
msg258357 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-01-16 04:52
My concern is more about backwards compatibility - at the moment, you can alter the behaviour of import, pickle, and other subsystems by modifying the module level attributes, and if we switch to preferring the __spec__ attributes, then that kind of code will break (I added an import specific example related to __main__ module relative imports to the linked issue).

That's not to say it shouldn't be done - as you say, it would be nice to eventually get to a point where the import system only needs access to the module spec and not to the runtime state, and there are also cases where the __spec__ information will be more correct (e.g. pickling objects in __main__).

However, it needs to be in such a way that there are appropriate porting notes that explain to people why their state mutations stopped having the desired effect, and what (if anything) they can do instead.
msg258398 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-16 18:14
I totally agree proper notes in the What's New doc need to be in place to explain that people need to update.

How about I tweak the __package__ change to continue to prefer __package__ over __spec__.parent, but raise an ImportWarning when they differ? It can also fall back to __spec__.parent if __package__ isn't defined and simply not worry about the lack of __package__? Then we can do an ImportWarning for all of the other attributes when we discover a difference so people have time to migrate to updating both locations, and then eventually we can invert the priority and then after that drop the module attributes.
msg258440 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2016-01-17 02:24
That approach sounds good to me.

The main problem cases I'm aware of are:

__name__:

* reliably wrong in __main__
* the attribute you mess with if you want __qualname__ on functions and classes to be different so that pickle will import them from somewhere else (e.g. a parent package)

__path__:

* used for dynamic package definitions (including namespace package emulation)

__package__:

* AFAIK, mainly useful as a workaround for other people doing "bad things" (TM), like running package submodules directly as __main__ rather than using the -m switch
msg258443 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-17 02:34
Yeah, which is why it will take a transition to get people to start mucking with __spec__ instead of the module attributes for their legitimate/questionable needs (although the whole `__name__ == '__main__'` idiom means __name__ will never go away while __path__, __package__, __loader__, __file__, and __cached__ could all slowly go away).
msg258844 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-22 23:28
__package__ != __spec__.parent now triggers an ImportWarning.
msg258845 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-01-22 23:30
I think that leaves the following attributes to be updated/checked for dependencies in importlib (and if they are found, raise ImportWarning when they differ):

1. __path__
2. __loader__
3. __file__
4. __cached__
History
Date User Action Args
2016-01-22 23:30:17brett.cannonsetmessages: + msg258845
2016-01-22 23:28:43brett.cannonsetdependencies: + Raise an ImportWarning when __spec__.parent/__package__ isn't defined for a relative import
messages: + msg258844
2016-01-17 02:34:45brett.cannonsetmessages: + msg258443
2016-01-17 02:24:11ncoghlansetmessages: + msg258440
2016-01-16 18:14:14brett.cannonsetmessages: + msg258398
2016-01-16 04:52:12ncoghlansetmessages: + msg258357
2016-01-15 21:41:08brett.cannonsetpriority: normal -> low

messages: + msg258332
versions: + Python 3.6, - Python 3.4, Python 3.5
2014-06-15 03:44:34ncoghlansetmessages: + msg220610
2014-06-15 02:41:37eric.snowsetmessages: + msg220606
2014-06-14 23:49:30ncoghlansetmessages: + msg220597
2014-06-14 21:33:51eric.snowcreate