Issue45540
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-10-20 17:11 by barry, last changed 2022-04-11 14:59 by admin.
Messages (9) | |||
---|---|---|---|
msg404500 - (view) | Author: Barry A. Warsaw (barry) * ![]() |
Date: 2021-10-20 17:11 | |
TL;DR module.__spec__.parent is read-only but module.__package__ is r/w despite documentation that implies that these two attributes should be identical, and various issues that focus on migrating from direct module attributes to ModuleSpec attributes. bpo-33277 and bpo-21762 Maybe we should relax the restriction on module.__spec__.parent so that it's writeable just like module.__package__. See also: https://docs.python.org/3/reference/import.html?highlight=__loader__#package__ https://docs.python.org/3/library/types.html?highlight=__loader__#types.ModuleType.__package__ |
|||
msg404511 - (view) | Author: Brett Cannon (brett.cannon) * ![]() |
Date: 2021-10-20 18:32 | |
I say make it writable. |
|||
msg404512 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2021-10-20 18:38 | |
On Wed, Oct 20, 2021 at 11:11 AM Barry A. Warsaw <report@bugs.python.org> wrote: > Maybe we should relax the restriction on module.__spec__.parent so that it's writeable just like module.__package__. Hmm, I hadn't realized __package__ is used in __import__(). That makes things messier. Regardless, I expect the primary reason __package__ is writable is because modules have never had any read-only attrs. The more important question now is, when is __package__ ever modified? I can think of 2 cases: 1. legacy (pre-451) importers set it when they load the module 2. someone wants to customize the import system without writing an importer With the latter, specifically they are trying to change how the module is reloaded or submodules are imported. In that case there should probably be an explicit way to do so without having to hack the import machinery. My only concern with any support/encouragement for modifying the spec after import is it results in less confidence about how the module was imported originally. |
|||
msg404551 - (view) | Author: Barry A. Warsaw (barry) * ![]() |
Date: 2021-10-21 00:11 | |
On Oct 20, 2021, at 11:38, Eric Snow <report@bugs.python.org> wrote: > > Regardless, I expect the primary reason __package__ is writable is > because modules have never had any read-only attrs. So historical accident mostly. > The more important question now is, when is __package__ ever modified? > I can think of 2 cases: > > 1. legacy (pre-451) importers set it when they load the module > 2. someone wants to customize the import system without writing an importer > > With the latter, specifically they are trying to change how the module > is reloaded or submodules are imported. In that case there should > probably be an explicit way to do so without having to hack the import > machinery. > > My only concern with any support/encouragement for modifying the spec > after import is it results in less confidence about how the module was > imported originally. I guess a question to answer then is whether we philosophically want the module attributes to be equivalent to the spec attributes. And by equivalent, I mean enforced to be exactly so, and thus a proxy. To me, the duplication is a wart that we should migrate away from so there’s only one place for these attributes, and that should be the spec. Here is the mapping we currently describe in the docs: mod.__name__ === __spec__.name mod.__package__ === __spec__.parent mod.__loader__ === __spec__.loader mod.__file__ === __spec__.origin mod.__path__ === __spec__.submodule_search_locations mod.__cached__ === __spec__.cached But right now, they don’t have to stay in sync, and I don’t think it’s reasonable to put the onus on the user to keep them in sync, because it’s unclear what code uses which attribute. Okay, so you can just set them both to be safe, but then you can’t do that with __spec__.parent/__package__ This is what leads me to think that having a proxy to keep them in sync and relaxing the read-only restriction is the path forward, even if writing __package__ doesn’t make sense. It also seems like the easier way to keep backward compatibility, rather than enforcing read-only on __package__ to match __spec__.parent. So the question is less about whether this is useful than whether it will break things if they write to it. |
|||
msg404619 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2021-10-21 16:52 | |
On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <report@bugs.python.org> wrote: > I guess a question to answer then is whether we philosophically want the module attributes to be equivalent to the spec attributes. And by equivalent, I mean enforced to be exactly so, and thus a proxy. To me, the duplication is a wart that we should migrate away from so there’s only one place for these attributes, and that should be the spec. > > Here is the mapping we currently describe in the docs: > > mod.__name__ === __spec__.name > mod.__package__ === __spec__.parent > mod.__loader__ === __spec__.loader > mod.__file__ === __spec__.origin > mod.__path__ === __spec__.submodule_search_locations > mod.__cached__ === __spec__.cached > > But right now, they don’t have to stay in sync, and I don’t think it’s reasonable to put the onus on the user to keep them in sync, because it’s unclear what code uses which attribute. Okay, so you can just set them both to be safe, but then you can’t do that with __spec__.parent/__package__ Currently any of the module attrs can be different than the spec. In two cases they can legitimately be different: __name__ (with __main__) and __file__ (with frozen stdlib modules). For the rest, they should be in sync. Treating the spec as the single source of truth makes sense. My only concern has been that you can no longer determine how a module was originally imported once the spec is changed. However, I just realized that you can always run importlib.util.find_spec() to reproduce that info (with some minor caveats). So now I'm less concerned about that. :) Notably, users have forever(?) been able to modify all of the module attrs, with impact on the import system: __package__ and __path__ affecting later imports, and the rest affecting reload. FWIW, an "advantage" of the module attrs is that they can be set in the module code. The same is true for the corresponding spec attrs but with just enough indirection to require more intent. Regardless, the idea of post-import modifications to modules/specs has always made me uncomfortable. As a user I'd expect an alternative that feels less like a (non-obvious) low-level hack. ==== To me here are the important questions: 1. when does code ever modify the module attrs (or spec) and why? 2. should we distinguish the roles of the module attrs and spec (how-module-was-loaded vs. how-module-will-reload vs. how-module-impacts-other-imports)? 3. would it make sense to store spec modifications separately from the spec (e.g. on the module)? 4. which attrs should be deprecated? 5. should any module attrs (the ones that don't get eventually removed) be read-only? What about spec attrs? 6. would it be better to provide importlib.util.* helpers to address those needs, instead of having folks modify the module/spec directly? My take: 1. that would be nice to know :) 2. that depends on what matters in practice. My gut says the distinctions aren't important enough to do anything about it, except where there are legitimate differences between the module and spec. Currently the module attrs cover all three roles. The spec only covers how-module-was-loaded (but is used as a fallback for the other two roles in *most* cases). Those two special cases, with __name__ and __file__ being out of sync, are meaningful only for introspection, rather than affecting the import machinery. (In the case of frozen modules that have __file__ set, note that spec.has_location is False.) I'm not sure how these fit in with the different roles. Advantages to keeping the spec exclusively how-module-was-imported: * it's what I'd expect; having to call importlib.util.find_spec() isn't the obvious thing * the loader can modify the spec, so importlib.util.find_spec() won't necessarily match None of those appear important enough to warrant keeping the status quo. The disadvantages seem heavier (maintenance costs and user confusion with (unnecessarily) having multiple sources of truth). 3. probably not, though it depends on (2) However, if all those module attrs become read-only then we would need to figure out where to store __name__ and __file__ in those special cases. 4. everything except __name__ and __file__ (and probably __path__) 5. for modules, yes; for the spec, only if we stick with the one role On modules I'd expect all of them to become properties regardless, with most of them becoming read-only eventually: getter: * proxy the corresponding spec attr * a deprecation warning if it isn't an attr that needs to stay setter: * proxy the corresponding spec attr * a deprecation warning for now on all attrs * a deprecation error later on all attrs * an AttributeError even later (do not make it a data-only descriptor) A bonus advantage of properties is that they would reduce clutter on the module __dict__. What about __path__? We'll probably keep it as a traditional indicator that the module is a package. However, do we make it a read-only proxy of spec.module_search_locations? (We already use a __path__ proxy for namespace packages.) 6. it probably isn't worth it. Due to the extra indirection, modifying the spec seems like a more deliberate (non-accidental or confused) action than changing the module attrs. That's probably enough "help". However, in cases where multiple attrs together have specific meaning, such helpers might be helpful for users. ==== Regarding __file__ being different from spec.origin, it might be worth revisiting the question of "origin" vs. "location" on the spec. Note that, in the case of frozen stdlib modules, spec.has_location is False even though __file__ is set. That smells fishy to me. |
|||
msg404620 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2021-10-21 16:52 | |
On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <report@bugs.python.org> wrote: > This is what leads me to think that having a proxy to keep them in sync and relaxing the read-only restriction is the path forward, even if writing __package__ doesn’t make sense. It also seems like the easier way to keep backward compatibility, rather than enforcing read-only on __package__ to match __spec__.parent. > > So the question is less about whether this is useful than whether it will break things if they write to it. I don't see any significant problem with making spec.parent writable. It's read-only now only because it is computed from spec.name and any other value doesn't make sense (which read-only communicates). My preference would be to make __package__ read-only instead. :) However, I doubt it will make a difference in practice either way, so I'm fine either way. |
|||
msg404628 - (view) | Author: Barry A. Warsaw (barry) * ![]() |
Date: 2021-10-21 17:54 | |
Thanks for your comprehensive comments Eric! While I digest them, my TL;DR is that we probably need a PEP to describe everything from the current situation, to the the desired end state and migration path. I'm willing to put that together with you and Brett as co-authors if you're amenable. |
|||
msg404630 - (view) | Author: Eric Snow (eric.snow) * ![]() |
Date: 2021-10-21 18:25 | |
On Thu, Oct 21, 2021 at 11:54 AM Barry A. Warsaw <report@bugs.python.org> wrote: > Thanks for your comprehensive comments Eric! While I digest them, my TL;DR is that we probably need a PEP to describe everything from the current situation, to the the desired end state and migration path. I'm willing to put that together with you and Brett as co-authors if you're amenable. +1 |
|||
msg404667 - (view) | Author: Brett Cannon (brett.cannon) * ![]() |
Date: 2021-10-21 21:28 | |
Sure, if you want to go full PEP on this I'm happy to be a co-author if the end goal is to ditch the (now) extraneous attributes. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:51 | admin | set | github: 89703 |
2021-10-21 21:28:37 | brett.cannon | set | messages: + msg404667 |
2021-10-21 18:25:26 | eric.snow | set | messages: + msg404630 |
2021-10-21 17:54:25 | barry | set | messages: + msg404628 |
2021-10-21 16:52:50 | eric.snow | set | messages: + msg404620 |
2021-10-21 16:52:05 | eric.snow | set | messages: + msg404619 |
2021-10-21 00:11:29 | barry | set | messages: + msg404551 |
2021-10-20 18:38:42 | eric.snow | set | messages: + msg404512 |
2021-10-20 18:32:32 | brett.cannon | set | messages: + msg404511 |
2021-10-20 17:11:39 | barry | set | nosy:
+ brett.cannon, eric.snow |
2021-10-20 17:11:23 | barry | create |