This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: module.__package__ and module.__spec__.parent have different semantics
Type: Stage:
Components: Interpreter Core Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, brett.cannon, eric.snow
Priority: normal Keywords:

Created on 2021-10-20 17:11 by barry, last changed 2022-04-11 14:59 by admin.

Messages (9)
msg404500 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-20 17:11
TL;DR module.__spec__.parent is read-only but module.__package__ is r/w despite documentation that implies that these two attributes should be identical, and various issues that focus on migrating from direct module attributes to ModuleSpec attributes.

bpo-33277 and bpo-21762

Maybe we should relax the restriction on module.__spec__.parent so that it's writeable just like module.__package__.

See also: 

https://docs.python.org/3/reference/import.html?highlight=__loader__#package__

https://docs.python.org/3/library/types.html?highlight=__loader__#types.ModuleType.__package__
msg404511 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-10-20 18:32
I say make it writable.
msg404512 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-20 18:38
On Wed, Oct 20, 2021 at 11:11 AM Barry A. Warsaw <report@bugs.python.org> wrote:
> Maybe we should relax the restriction on module.__spec__.parent so that it's writeable just like module.__package__.

Hmm, I hadn't realized __package__ is used in __import__().  That
makes things messier.

Regardless, I expect the primary reason __package__ is writable is
because modules have never had any read-only attrs.

The more important question now is, when is __package__ ever modified?
 I can think of 2 cases:

1. legacy (pre-451) importers set it when they load the module
2. someone wants to customize the import system without writing an importer

With the latter, specifically they are trying to change how the module
is reloaded or submodules are imported.  In that case there should
probably be an explicit way to do so without having to hack the import
machinery.

My only concern with any support/encouragement for modifying the spec
after import is it results in less confidence about how the module was
imported originally.
msg404551 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-21 00:11
On Oct 20, 2021, at 11:38, Eric Snow <report@bugs.python.org> wrote:
> 
> Regardless, I expect the primary reason __package__ is writable is
> because modules have never had any read-only attrs.

So historical accident mostly.

> The more important question now is, when is __package__ ever modified?
> I can think of 2 cases:
> 
> 1. legacy (pre-451) importers set it when they load the module
> 2. someone wants to customize the import system without writing an importer
> 
> With the latter, specifically they are trying to change how the module
> is reloaded or submodules are imported.  In that case there should
> probably be an explicit way to do so without having to hack the import
> machinery.
> 
> My only concern with any support/encouragement for modifying the spec
> after import is it results in less confidence about how the module was
> imported originally.

I guess a question to answer then is whether we philosophically want the module attributes to be equivalent to the spec attributes.  And by equivalent, I mean enforced to be exactly so, and thus a proxy.  To me, the duplication is a wart that we should migrate away from so there’s only one place for these attributes, and that should be the spec.

Here is the mapping we currently describe in the docs:

mod.__name__ === __spec__.name
mod.__package__ === __spec__.parent
mod.__loader__ === __spec__.loader
mod.__file__ === __spec__.origin
mod.__path__ === __spec__.submodule_search_locations
mod.__cached__ === __spec__.cached

But right now, they don’t have to stay in sync, and I don’t think it’s reasonable to put the onus on the user to keep them in sync, because it’s unclear what code uses which attribute.  Okay, so you can just set them both to be safe, but then you can’t do that with __spec__.parent/__package__

This is what leads me to think that having a proxy to keep them in sync and relaxing the read-only restriction is the path forward, even if writing __package__ doesn’t make sense.  It also seems like the easier way to keep backward compatibility, rather than enforcing read-only on __package__ to match __spec__.parent.

So the question is less about whether this is useful than whether it will break things if they write to it.
msg404619 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-21 16:52
On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <report@bugs.python.org> wrote:
> I guess a question to answer then is whether we philosophically want the module attributes to be equivalent to the spec attributes.  And by equivalent, I mean enforced to be exactly so, and thus a proxy.  To me, the duplication is a wart that we should migrate away from so there’s only one place for these attributes, and that should be the spec.
>
> Here is the mapping we currently describe in the docs:
>
> mod.__name__ === __spec__.name
> mod.__package__ === __spec__.parent
> mod.__loader__ === __spec__.loader
> mod.__file__ === __spec__.origin
> mod.__path__ === __spec__.submodule_search_locations
> mod.__cached__ === __spec__.cached
>
> But right now, they don’t have to stay in sync, and I don’t think it’s reasonable to put the onus on the user to keep them in sync, because it’s unclear what code uses which attribute.  Okay, so you can just set them both to be safe, but then you can’t do that with __spec__.parent/__package__

Currently any of the module attrs can be different than the spec.  In
two cases they can legitimately be different: __name__ (with __main__)
and __file__ (with frozen stdlib modules).  For the rest, they should
be in sync.

Treating the spec as the single source of truth makes sense.  My only
concern has been that you can no longer determine how a module was
originally imported once the spec is changed.  However, I just
realized that you can always run importlib.util.find_spec() to
reproduce that info (with some minor caveats).  So now I'm less
concerned about that. :)

Notably, users have forever(?) been able to modify all of the module
attrs, with impact on the import system: __package__ and __path__
affecting later imports, and the rest affecting reload.

FWIW, an "advantage" of the module attrs is that they can be set in
the module code.  The same is true for the corresponding spec attrs
but with just enough indirection to require more intent.

Regardless, the idea of post-import modifications to modules/specs has
always made me uncomfortable.  As a user I'd expect an alternative
that feels less like a (non-obvious) low-level hack.

====

To me here are the important questions:

1. when does code ever modify the module attrs (or spec) and why?
2. should we distinguish the roles of the module attrs and spec
(how-module-was-loaded vs. how-module-will-reload vs.
how-module-impacts-other-imports)?
3. would it make sense to store spec modifications separately from the
spec (e.g. on the module)?
4. which attrs should be deprecated?
5. should any module attrs (the ones that don't get eventually
removed) be read-only?  What about spec attrs?
6. would it be better to provide importlib.util.* helpers to address
those needs, instead of having folks modify the module/spec directly?

My take:

1. that would be nice to know :)
2. that depends on what matters in practice.  My gut says the
distinctions aren't important enough to do anything about it, except
where there are legitimate differences between the module and spec.

Currently the module attrs cover all three roles.  The spec only
covers how-module-was-loaded (but is used as a fallback for the other
two roles in *most* cases).

Those two special cases, with __name__ and __file__ being out of sync,
are meaningful only for introspection, rather than affecting the
import machinery.  (In the case of frozen modules that have __file__
set, note that spec.has_location is False.)  I'm not sure how these
fit in with the different roles.

Advantages to keeping the spec exclusively how-module-was-imported:

* it's what I'd expect; having to call importlib.util.find_spec()
isn't the obvious thing
* the loader can modify the spec, so importlib.util.find_spec() won't
necessarily match

None of those appear important enough to warrant keeping the status
quo.  The disadvantages seem heavier (maintenance costs and user
confusion with (unnecessarily) having multiple sources of truth).

3. probably not, though it depends on (2)

However, if all those module attrs become read-only then we would need
to figure out where to store __name__ and __file__ in those special
cases.

4. everything except __name__ and __file__ (and probably __path__)
5. for modules, yes; for the spec, only if we stick with the one role

On modules I'd expect all of them to become properties regardless,
with most of them becoming read-only eventually:

getter:
* proxy the corresponding spec attr
* a deprecation warning if it isn't an attr that needs to stay

setter:
* proxy the corresponding spec attr
* a deprecation warning for now on all attrs
* a deprecation error later on all attrs
* an AttributeError even later (do not make it a data-only descriptor)

A bonus advantage of properties is that they would reduce clutter on
the module __dict__.

What about __path__?  We'll probably keep it as a traditional
indicator that the module is a package.  However, do we make it a
read-only proxy of spec.module_search_locations?  (We already use a
__path__ proxy for namespace packages.)

6. it probably isn't worth it.

Due to the extra indirection, modifying the spec seems like a more
deliberate (non-accidental or confused) action than changing the
module attrs.  That's probably enough "help".  However, in cases where
multiple attrs together have specific meaning, such helpers might be
helpful for users.

====

Regarding __file__ being different from spec.origin, it might be worth
revisiting the question of "origin" vs. "location" on the spec. Note
that, in the case of frozen stdlib modules, spec.has_location is False
even though __file__ is set.  That smells fishy to me.
msg404620 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-21 16:52
On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <report@bugs.python.org> wrote:
> This is what leads me to think that having a proxy to keep them in sync and relaxing the read-only restriction is the path forward, even if writing __package__ doesn’t make sense.  It also seems like the easier way to keep backward compatibility, rather than enforcing read-only on __package__ to match __spec__.parent.
>
> So the question is less about whether this is useful than whether it will break things if they write to it.

I don't see any significant problem with making spec.parent writable.
It's read-only now only because it is computed from spec.name and any
other value doesn't make sense (which read-only communicates).  My
preference would be to make __package__ read-only instead. :)
However, I doubt it will make a difference in practice either way, so
I'm fine either way.
msg404628 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-21 17:54
Thanks for your comprehensive comments Eric!  While I digest them, my TL;DR is that we probably need a PEP to describe everything from the current situation, to the the desired end state and migration path.  I'm willing to put that together with you and Brett as co-authors if you're amenable.
msg404630 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-21 18:25
On Thu, Oct 21, 2021 at 11:54 AM Barry A. Warsaw <report@bugs.python.org> wrote:
> Thanks for your comprehensive comments Eric!  While I digest them, my TL;DR is that we probably need a PEP to describe everything from the current situation, to the the desired end state and migration path.  I'm willing to put that together with you and Brett as co-authors if you're amenable.

+1
msg404667 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2021-10-21 21:28
Sure, if you want to go full PEP on this I'm happy to be a co-author if the end goal is to ditch the (now) extraneous attributes.
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89703
2021-10-21 21:28:37brett.cannonsetmessages: + msg404667
2021-10-21 18:25:26eric.snowsetmessages: + msg404630
2021-10-21 17:54:25barrysetmessages: + msg404628
2021-10-21 16:52:50eric.snowsetmessages: + msg404620
2021-10-21 16:52:05eric.snowsetmessages: + msg404619
2021-10-21 00:11:29barrysetmessages: + msg404551
2021-10-20 18:38:42eric.snowsetmessages: + msg404512
2021-10-20 18:32:32brett.cannonsetmessages: + msg404511
2021-10-20 17:11:39barrysetnosy: + brett.cannon, eric.snow
2021-10-20 17:11:23barrycreate