classification
Title: Deprecate __loader__, __package__, __file__, and __cached__ on modules
Type: Stage: test needed
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, brett.cannon, eric.snow, ncoghlan, pitrou, pmpp, rhettinger
Priority: low Keywords:

Created on 2018-04-13 21:48 by brett.cannon, last changed 2018-04-23 00:22 by brett.cannon.

Messages (8)
msg315269 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-13 21:48
If you look at https://docs.python.org/3/reference/import.html#import-related-module-attributes you will notice there are a lot of attributes on modules. But since the introduction of module specs (https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec), all of those attributes became redundant.

Now some attributes can never be taken away based on common idioms in the Python community. __name__ can't go due to `if __name__ == '__main__'`. __path__ can't go due to people manipulating it with e.g. pkg_resources or pkgutil. The rest, though, don't have such strong idioms tied to them (and in the case of __package__, has actually been made officially redundant by the import system since Python 3.6). That means setting them is unnecessary and any time someone wants to read or potentially manipulate them they have to choose between those attributes or the spec (or update both). Either way it can lead to bugs (I know I have nearly forgotten to set both the spec and the attribute before).

I don't know if we can easily deprecate these attributes using __getattr__ on modules as added in Python 3.7, or if this is more of a documentation thing and need a long period of clearly messaging that these redundant attributes will no longer be set at some point.
msg315276 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-04-14 00:21
I agree.  It should also be pointed out that we've had inconsistencies between the module attributes and the spec attributes, and even fixing those has lead to problems.  There should be a single source of truth, and the module spec should be that.

+1, and I agree about not being able to __name__ and __path__.
msg315521 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-04-20 16:28
-1 from me for __package__, as setting that's the preferred way for directly executed scripts to fix their explicit relative imports (rather than trying to retroactively fix their spec). (https://docs.python.org/3/library/importlib.html also doesn't say anything about __package__ being deprecated)

I'd be OK with seeing __file__/__cached__/__loader__ get deprecated, but before they went away, I'd like to see "importlib.util.get_filename()", "importlib.util.get_cache_filename()", and "importlib.util.get_loader()"  helpers added.
msg315534 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-20 20:20
+1 from me.  The visual clutter in dir() has proven to be somewhat distracting in introduction-to-python courses.
msg315550 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-04-21 05:52
As a concrete proposal for 3.8, how about:

1. Add the following utility functions to importlib.util (names adjusted to match PEP 451 rather than my initial suggestions above):

    def get_location(module):
        try:
            return module.__file__
        except AttributeError:
            pass
        spec = module.__spec__
        if spec.has_location:
            return spec.origin
        return None

    def get_cached_location(module):
        try:
            return module.__cached__
        except AttributeError:
            pass
        spec = module.__spec__
        if spec.has_location:
            return spec.cached
        return None

    def get_loader(module):
        try:
            return module.__loader__
        except AttributeError:
            pass
        return module.__spec__.loader

2. Explicitly make __file__/__cached__/__loader__ optional, and require 3rd party tools to use the relevant `importlib.util.get_*` APIs when they're defined rather than accessing the dunder-attributes directly.

I think those 3 are clear, but the right API for replacing direct access to __package__ is less clear, since there are two cases for querying the package name: whether you want actual packages to return their own name, or whether you want the name of their parent.

If we decide the relevant use case is "get the base package name for relative imports", then the following helper API may work:

    def get_base_package_name(module):
        """Get the package name to pass to resolve_name() for this module"""
        try:
            return module.__package__
        except AttributeError:
            pass
        modname = module.__name__
        try:
            path = module.__path__
        except AttributeError:
            pass
        else:
            return modname
        return module.__spec__.parent
msg315585 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-04-21 22:33
This sounds like gratuitous breakage to me.  __file__ in particular is used quite often.
msg315620 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-04-22 17:00
Not that I've thought this through in much detail, but what if we start out by proxying those module attributes to the spec via PEP 562 hooks?  We can do that in 3.8, issue (silent?) deprecation warnings, and then do a full deprecation process for those module attributes?

It's not a terrible thing to keep those attributes as long as they match what's in the spec.  It's a problem when they're out of sync (see bpo-32305 and bpo-32303).
msg315635 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-23 00:22
@Nick: We've had this discussion before about specs being a "receipt" of import versus not and the last time we agreed on not. :) That's why we raise a warning if __package__ doesn't match __spec__.parent: https://github.com/python/cpython/blob/d5a2377c3d70e4143bcbee4a765b3434e21f683a/Lib/importlib/_bootstrap.py#L1056-L1059

It's documented through https://docs.python.org/3/reference/import.html#__package__ by saying __package__ has to match __spec__.parent. We didn't make it stronger since we can't get rid of any of these attributes until Python 2.7 compatibility is no longer a concern. And I believe __package__/__spec__.parent is the only attribute whose data is in any way directly relied on in importlib which is being proposed for removal (e.g. __file__, __cached__, and __loader__ are just recorded).

@Raymond: Thanks for the teaching info. I actually opened this after running dir() on a module and having there be just as many module-related attributes as defined attributes.

@Antoine: we can keep __file__ if necessary (I was on the fence about keeping it to begin with). But as Barry pointed out, bugs have occurred because we now have two places for most of these details and people don't always know which one to update.

@Barry: the tricky bit is what to do if someone sets their own __getattr__() on a module? E.g. I just published modutil which provides a __getattr__ for lazy importing, so users of this would need a way to not have either function get stomped on by the other.
History
Date User Action Args
2018-04-23 00:22:11brett.cannonsetmessages: + msg315635
2018-04-22 17:00:15barrysetmessages: + msg315620
2018-04-21 23:46:18pmppsetnosy: + pmpp
2018-04-21 22:33:10pitrousetnosy: + pitrou
messages: + msg315585
2018-04-21 05:52:46ncoghlansetmessages: + msg315550
2018-04-20 20:20:53rhettingersetnosy: + rhettinger
messages: + msg315534
2018-04-20 16:28:14ncoghlansetnosy: + ncoghlan
messages: + msg315521
2018-04-14 00:21:05barrysetnosy: + barry
messages: + msg315276
2018-04-13 21:48:26brett.cannonsetpriority: normal -> low
2018-04-13 21:48:22brett.cannoncreate