This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Deprecate __loader__, __package__, and __cached__ on modules
Type: Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, barry, brett.cannon, eric.snow, iritkatriel, ncoghlan, pitrou, pmpp, rhettinger
Priority: low Keywords: patch

Created on 2018-04-13 21:48 by brett.cannon, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 29078 open barry, 2021-10-20 02:13
Messages (17)
msg315269 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-13 21:48
If you look at https://docs.python.org/3/reference/import.html#import-related-module-attributes you will notice there are a lot of attributes on modules. But since the introduction of module specs (https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec), all of those attributes became redundant.

Now some attributes can never be taken away based on common idioms in the Python community. __name__ can't go due to `if __name__ == '__main__'`. __path__ can't go due to people manipulating it with e.g. pkg_resources or pkgutil. The rest, though, don't have such strong idioms tied to them (and in the case of __package__, has actually been made officially redundant by the import system since Python 3.6). That means setting them is unnecessary and any time someone wants to read or potentially manipulate them they have to choose between those attributes or the spec (or update both). Either way it can lead to bugs (I know I have nearly forgotten to set both the spec and the attribute before).

I don't know if we can easily deprecate these attributes using __getattr__ on modules as added in Python 3.7, or if this is more of a documentation thing and need a long period of clearly messaging that these redundant attributes will no longer be set at some point.
msg315276 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-04-14 00:21
I agree.  It should also be pointed out that we've had inconsistencies between the module attributes and the spec attributes, and even fixing those has lead to problems.  There should be a single source of truth, and the module spec should be that.

+1, and I agree about not being able to __name__ and __path__.
msg315521 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-04-20 16:28
-1 from me for __package__, as setting that's the preferred way for directly executed scripts to fix their explicit relative imports (rather than trying to retroactively fix their spec). (https://docs.python.org/3/library/importlib.html also doesn't say anything about __package__ being deprecated)

I'd be OK with seeing __file__/__cached__/__loader__ get deprecated, but before they went away, I'd like to see "importlib.util.get_filename()", "importlib.util.get_cache_filename()", and "importlib.util.get_loader()"  helpers added.
msg315534 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2018-04-20 20:20
+1 from me.  The visual clutter in dir() has proven to be somewhat distracting in introduction-to-python courses.
msg315550 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-04-21 05:52
As a concrete proposal for 3.8, how about:

1. Add the following utility functions to importlib.util (names adjusted to match PEP 451 rather than my initial suggestions above):

    def get_location(module):
        try:
            return module.__file__
        except AttributeError:
            pass
        spec = module.__spec__
        if spec.has_location:
            return spec.origin
        return None

    def get_cached_location(module):
        try:
            return module.__cached__
        except AttributeError:
            pass
        spec = module.__spec__
        if spec.has_location:
            return spec.cached
        return None

    def get_loader(module):
        try:
            return module.__loader__
        except AttributeError:
            pass
        return module.__spec__.loader

2. Explicitly make __file__/__cached__/__loader__ optional, and require 3rd party tools to use the relevant `importlib.util.get_*` APIs when they're defined rather than accessing the dunder-attributes directly.

I think those 3 are clear, but the right API for replacing direct access to __package__ is less clear, since there are two cases for querying the package name: whether you want actual packages to return their own name, or whether you want the name of their parent.

If we decide the relevant use case is "get the base package name for relative imports", then the following helper API may work:

    def get_base_package_name(module):
        """Get the package name to pass to resolve_name() for this module"""
        try:
            return module.__package__
        except AttributeError:
            pass
        modname = module.__name__
        try:
            path = module.__path__
        except AttributeError:
            pass
        else:
            return modname
        return module.__spec__.parent
msg315585 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2018-04-21 22:33
This sounds like gratuitous breakage to me.  __file__ in particular is used quite often.
msg315620 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-04-22 17:00
Not that I've thought this through in much detail, but what if we start out by proxying those module attributes to the spec via PEP 562 hooks?  We can do that in 3.8, issue (silent?) deprecation warnings, and then do a full deprecation process for those module attributes?

It's not a terrible thing to keep those attributes as long as they match what's in the spec.  It's a problem when they're out of sync (see bpo-32305 and bpo-32303).
msg315635 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-23 00:22
@Nick: We've had this discussion before about specs being a "receipt" of import versus not and the last time we agreed on not. :) That's why we raise a warning if __package__ doesn't match __spec__.parent: https://github.com/python/cpython/blob/d5a2377c3d70e4143bcbee4a765b3434e21f683a/Lib/importlib/_bootstrap.py#L1056-L1059

It's documented through https://docs.python.org/3/reference/import.html#__package__ by saying __package__ has to match __spec__.parent. We didn't make it stronger since we can't get rid of any of these attributes until Python 2.7 compatibility is no longer a concern. And I believe __package__/__spec__.parent is the only attribute whose data is in any way directly relied on in importlib which is being proposed for removal (e.g. __file__, __cached__, and __loader__ are just recorded).

@Raymond: Thanks for the teaching info. I actually opened this after running dir() on a module and having there be just as many module-related attributes as defined attributes.

@Antoine: we can keep __file__ if necessary (I was on the fence about keeping it to begin with). But as Barry pointed out, bugs have occurred because we now have two places for most of these details and people don't always know which one to update.

@Barry: the tricky bit is what to do if someone sets their own __getattr__() on a module? E.g. I just published modutil which provides a __getattr__ for lazy importing, so users of this would need a way to not have either function get stomped on by the other.
msg398926 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2021-08-04 18:07
> We didn't make it stronger since we can't get rid of any of these attributes until Python 2.7 compatibility is no longer a concern.

Maybe this can be revisited now.
msg398938 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-08-04 19:03
> Maybe this can be revisited now.

I've started the work already (albeit rather slowly 😄). Importlib has been updated to prefer __spec__.parent of __package__ and warns when the values are not equal. Next step will be to raise an ImportWarning when __package__ is defined but __spec__.parent is not (then DeprecationWarning after that).

For __loader__ I got Python 3.10 to fall back on to __spec__.loader. Next step is https://bugs.python.org/issue42132 to update C code to set __spec__ where appropriate. From there the check for equivalence can go in (which then leads to the warnings).

I have not started with __cached__ yet, so next step is making sure all uses fall back on __spec__.

So, to be clear, the steps I have been following are:

1. Make sure all code falls back on __spec__ when appropriate (one release)
2. Raise an ImportWarning when values differ (one release)
3. Make __spec__ take precedence over old attribute (one release)
4. Raise an ImportWarning when having to fall back to older attribute (two releases)
5. Raise a DeprecationWarning when falling back on older attribute (two releases)
6. Clean up code
msg404473 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-20 15:18
(Not sure how I missed this issue.)

The spec identifies how a module should be loaded (or how it was loaded and should be reloaded).  We should be careful to preserve that identify and not invite uses to modify the spec after (or while) the module is loaded.  PEP 451 talks about the difference a little, but I'm pretty sure it came up in more detail in discussions about the PEP at that time.

IIRC, originally PEP 451 made ModuleSpec immutable and there was a mutable version of it for use by finders that had a mechanism to "freeze" it.  (We dialed that back because it was mostly unnecessary extra complexity.)

As to the module attrs, I agree about which are unnecessary and which must stay (__name__, __file__, __path__).  __name__ and spec.name don't always match (e.g. for __main__).  Likewise for __file__ and spec.origin (e.g. frozen stdlib modules).

Regarding __path__, it is the only attr that has an impact on import behavior (other than reload) after the module is loaded, e.g. when finding submodules.  As noted, users sometimes modify it to affect that behavior. 
 We want the spec to preserve the information used when the module was loaded, so __path__ and how it's used by the import machinery should stay the same.

FWIW, I've also seen cases where __loader__ is overwritten to facilitate different reload behavior, but don't remember if I've seen that lately (and don't remember offhand if that even worked after PEP 451).  Regardless, I'm not convinced some other solution wouldn't be better to address that use case if needed (e.g. add "reload_loader" to ModuleSpec with this use case in mind).
msg404475 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-20 15:28
+1 on a proxy (with read-only attrs) for everything but __name__, __file__, and __path__ (which can all be different than the spec).  Ideally __name__ and __file__ would be read-only too but they'd have to be stored somewhere other than the spec (e.g. on the proxy or module object).  __path__ could be similarly proxied but would at the least have to be mutable.

The nice thing about such attrs is they wouldn't show up in the module's __dict__, assuming they are properties.

The only catch I see is where the module's code sets one of the attrs.  Would the proxy (or import machinery) pick those up, or maybe for some of them emit a warning or fail?

We could also skip the proxy and add the properties to ModuleType directly, no?
msg404510 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-10-20 18:31
> The spec identifies how a module should be loaded (or how it was loaded and should be reloaded).  We should be careful to preserve that identify and not invite uses to modify the spec after (or while) the module is loaded.

But they may want to modify it to influence reloading. There's a discussion somewhere where I talked about this with Nick and he agreed with me that trying to keep specs like receipts and all of these other attributes as mutable values had not really panned out after all of these years.

> +1 on a proxy (with read-only attrs) for everything but __name__, __file__, and __path__ (which can all be different than the spec).

I'm -1 on a proxy as that doesn't simplify the situation. Having (nearly) duplicate attributes is confusing and I have yet to see it benefit anyone.
msg404554 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-21 00:17
On Oct 20, 2021, at 11:31, Brett Cannon <report@bugs.python.org> wrote:
> 
>> +1 on a proxy (with read-only attrs) for everything but __name__, __file__, and __path__ (which can all be different than the spec).
> 
> I'm -1 on a proxy as that doesn't simplify the situation. Having (nearly) duplicate attributes is confusing and I have yet to see it benefit anyone.

If we don’t implement a proxy to enforce equivalence, then we shouldn’t say as such in the documentation.  And we should be much clearer about what the purpose of the spec is versus the module attributes.
msg404555 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-21 00:20
On Oct 20, 2021, at 08:18, Eric Snow <report@bugs.python.org> wrote:

> The spec identifies how a module should be loaded (or how it was loaded and should be reloaded).  We should be careful to preserve that identify and not invite uses to modify the spec after (or while) the module is loaded.  PEP 451 talks about the difference a little, but I'm pretty sure it came up in more detail in discussions about the PEP at that time.

It’s frankly not good enough to capture the *intent* of these attributes in a PEP.  We need to be clear in the documentation about their purpose and we need to be clear about when to use or set either the module attribute or the spec attribute.  The current situation is ambiguous and possibly misleading.  But maybe it’s “just” a documentation issue?
msg404556 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-21 00:24
On Oct 20, 2021, at 08:28, Eric Snow <report@bugs.python.org> wrote:
> 
> The only catch I see is where the module's code sets one of the attrs.  Would the proxy (or import machinery) pick those up, or maybe for some of them emit a warning or fail?
> 
> We could also skip the proxy and add the properties to ModuleType directly, no?

My initial idea was to proxy from the ModuleType to the ModuleSpec and I started down that path, but hacking in C is much more painful than experimenting in Python, so that’s why I did my WIP branch (i.e. just to play with the feasibility of it).

Thus I had planned to keep the actual attribute values in the ModuleSpec and essentially implement a getattro and setattro in the ModuleType that proxied sets and gets to the ModuleSpec.
msg404623 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-21 16:58
I ended up writing up more thoughts on this in the issue about __package__: https://bugs.python.org/issue45540#msg404619.

Users can call importlib.util.find_spec(), which is probably good enough to figure out how a module was originally imported.  So I don't have any serious objections to making the spec the writable single-source-of-truth.

FWIW, I'd rather there were high-level helpers available as an alternative to the low-level practice of modifying module attrs.  However, in this case it probably isn't frequent enough to make it worth it.
History
Date User Action Args
2022-04-11 14:58:59adminsetgithub: 77458
2021-10-21 16:58:47eric.snowsetmessages: + msg404623
2021-10-21 00:24:06barrysetmessages: + msg404556
2021-10-21 00:20:43barrysetmessages: + msg404555
2021-10-21 00:17:50barrysetmessages: + msg404554
2021-10-20 18:31:51brett.cannonsetmessages: + msg404510
2021-10-20 15:28:20eric.snowsetmessages: + msg404475
2021-10-20 15:18:15eric.snowsetmessages: + msg404473
2021-10-20 02:13:50barrysetkeywords: + patch
stage: test needed -> patch review
pull_requests: + pull_request27347
2021-10-19 18:49:02brett.cannonsettitle: Deprecate __loader__, __package__, __file__, and __cached__ on modules -> Deprecate __loader__, __package__, and __cached__ on modules
2021-10-19 01:35:16FFY00setnosy: + FFY00
2021-08-04 19:03:50brett.cannonsetmessages: + msg398938
2021-08-04 18:07:41iritkatrielsetnosy: + iritkatriel

messages: + msg398926
versions: + Python 3.11, - Python 3.8
2018-04-23 00:22:11brett.cannonsetmessages: + msg315635
2018-04-22 17:00:15barrysetmessages: + msg315620
2018-04-21 23:46:18pmppsetnosy: + pmpp
2018-04-21 22:33:10pitrousetnosy: + pitrou
messages: + msg315585
2018-04-21 05:52:46ncoghlansetmessages: + msg315550
2018-04-20 20:20:53rhettingersetnosy: + rhettinger
messages: + msg315534
2018-04-20 16:28:14ncoghlansetnosy: + ncoghlan
messages: + msg315521
2018-04-14 00:21:05barrysetnosy: + barry
messages: + msg315276
2018-04-13 21:48:26brett.cannonsetpriority: normal -> low
2018-04-13 21:48:22brett.cannoncreate