classification
Title: Add __file__ attribute to frozen modules
Type: behavior Stage: needs patch
Components: Interpreter Core, Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: barry, eric.snow, gregory.p.smith, gvanrossum, lemburg, ncoghlan
Priority: normal Keywords: patch

Created on 2014-06-12 16:14 by lemburg, last changed 2021-10-28 19:08 by eric.snow.

Files
File name Uploaded Description Edit
__file__-for-frozen-modules.patch lemburg, 2014-06-12 16:15
Pull Requests
URL Status Linked Edit
PR 28335 merged eric.snow, 2021-09-14 16:09
PR 28655 open eric.snow, 2021-09-30 15:06
PR 28656 merged eric.snow, 2021-09-30 15:56
Messages (13)
msg220362 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-06-12 16:14
The missing __file__ attribute on frozen modules causes lots of issues with the stdlib (see e.g. Issue21709 and the stdlib test suite) and other tools that expect this attribute to always be present.

The attached patch for 3.4.1 adds this attribute to all frozen modules and resolves most issues. It cannot resolve the problem of not necessarily finding the directories/files listed in those attributes, but at least makes it possible to continue using code that only uses the attribute for error reporting.
msg220365 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2014-06-12 16:35
I'm -0 on this patch.  I can understand that in some sense, frozen modules do semantically have an associated file, but OTOH, once they're frozen the connection to their file is broken.  Also, I think anything that assumes __file__ exists is simply broken and should be fixed.  There are other cases than frozen modules where a module would have no reasonable value for __file__ and thus shouldn't have one.
msg220367 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-06-12 16:41
On 12.06.2014 18:35, Barry A. Warsaw wrote:
> 
> I'm -0 on this patch.  I can understand that in some sense, frozen modules do semantically have an associated file, but OTOH, once they're frozen the connection to their file is broken.  Also, I think anything that assumes __file__ exists is simply broken and should be fixed.  There are other cases than frozen modules where a module would have no reasonable value for __file__ and thus shouldn't have one.

This one falls into the practicality beats purity category. Of
course, the __file__ attribute doesn't always makes sense as
file path, but it does serve an information purpose.

We're doing this in eGenix PyRun to get 3rd party code working
(including parts of the Python stdlib :-)). Not doing so
would simply lead to the whole freezing approach pretty much
useless, since so much code uses the attribute without checking
or providing a fallback solution.
msg220368 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2014-06-12 16:43
PBP might be reasonably used to justify it for the frozen case.  I just don't want to use that as a wedge to define __file__ in *all* cases, even when no reasonable file name exists.
msg220586 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2014-06-14 21:57
__file__ is the filename from which the module *was* loaded (the inspect doc [1] should probably reflect that [2]).  The import machinery only uses the module's __spec__ (origin, etc.).  __file__ is set strictly as informational (and for backward-compatibility).

Per the language reference [3], __file__ may be omitted when it does not have semantic meaning.  It also says "Ultimately, the loader is what makes use of __file__", but that hasn't been accurate since PEP 451 landed. [4]  Notably, though, for now the module __repr__() *does* use __file__ if it is available (and the loader doesn't implement module_repr).  The counterpart of __file__ within a module's spec is __spec__.origin.  The two should stay in sync.  In the case of frozen modules origin is set to "frozen".

Giving __file__ to frozen modules is inaccurate.  The file probably won't be there and the module certainly wasn't loaded from that location.

Stdlib modules should not rely on all module's having __file__.  Removing __file__ from frozen modules was a change in 3.4 and I'd consider it a 3.4 bug if any stdlib module relied on it.  For that matter, I'd consider it a bug if a module relied on all modules having __file__ or even __file__ being set to an actual filename.

Would it be inappropriate to set an additional informational attribute on frozen modules to indicate the original path?  Something like __frozen_filename__.  Then you wouldn't need to rely on __code__.co_filename.

p.s. Searching for __file__ in the docs [5] illustrates its prevalence.

[1] https://docs.python.org/3/library/inspect.html#types-and-members
[2] issue #21760
[3] https://docs.python.org/3/reference/import.html#__file__
[4] issue #21761
[5] https://docs.python.org/3/search.html?q=__file__
msg220598 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-06-14 23:53
Can we just drop "__file__" and set the origin for frozen modules to
something that includes the original file name?
msg220623 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2014-06-15 09:22
On 15.06.2014 01:53, Nick Coghlan wrote:
> 
> Can we just drop "__file__" and set the origin for frozen modules to
> something that includes the original file name?

This wouldn't really help, because too much code out there uses
the __file__ attribute and assumes it's always available.

Note that the filename information is already available in the
code object's co_filename attribute.
msg401906 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-09-15 21:07
> Note that the filename information is already available in the
> code object's co_filename attribute.

Hm, in the latest (3.11) main that's not true -- co_filename is set to something like "<frozen ntpath>". And that's about right, since the filename contained in the frozen data can at best reflect the filename at the time the freeze script ran, which is not necessarily the same as the filename when the user runs the Python binary.

I don't think we should try to set __file__ or any other attributes on frozen modules (at least not for the small set of frozen modules we're contemplating for 3.11).  User who need the file can run their Python 3.11 binary with -Xfrozen_modules=off, to disable the use of frozen modules (other than the three importlib bootstrap files).

Tools that freeze the entire stdlib or 3rd party code may have to make a  different choice, but that's not our problem (yet).
msg403953 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-14 21:32
New changeset 79cf20e48d0b5d69d9fac2a0204b5ac2c366066a by Eric Snow in branch 'main':
bpo-21736: Set __file__ on frozen stdlib modules. (gh-28656)
https://github.com/python/cpython/commit/79cf20e48d0b5d69d9fac2a0204b5ac2c366066a
msg403954 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-14 21:37
I've merged the change to add __file__ for frozen stdlib modules (when possible) and to set __path__ appropriately.  There are still a number of things to address though:

* set co_filename (for tracebacks)
* implement FrozenImporter.get_filename()
* implement FrozenImporter.get_source()

And then there's the question of supporting __file__, etc. for custom frozen modules.  Presumably all those things are still covered by this issue so I'm leaving it open.
msg404168 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-10-18 09:21
That appears to have caused https://bugs.python.org/issue45506
msg404228 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2021-10-18 21:57
Weird.  PR 28655 is merged on GH, but still shows open on this bpo ticket.
msg405228 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-28 19:08
I've opened the following issues related to frozen stdlib modules:

https://bugs.python.org/issue45652
https://bugs.python.org/issue45658
https://bugs.python.org/issue45659

Again, I'm leaving this issue open to deal with the broader question of frozen modules outside the stdlib, where the solution isn't so obvious.  The question of targeting the FileLoader (or SourceLoader) ABC should probably be handled separately.  (Note that bpo-45020 has some related discussion.)
History
Date User Action Args
2021-10-28 19:08:33eric.snowsettype: behavior
messages: + msg405228
2021-10-18 21:57:36barrysetmessages: + msg404228
2021-10-18 09:21:23gregory.p.smithsetnosy: + gregory.p.smith
messages: + msg404168
2021-10-14 21:37:53eric.snowsetstage: patch review -> needs patch
messages: + msg403954
versions: + Python 3.11, - Python 3.4, Python 3.5
2021-10-14 21:32:29eric.snowsetmessages: + msg403953
2021-09-30 15:56:48eric.snowsetpull_requests: + pull_request27022
2021-09-30 15:06:49eric.snowsetpull_requests: + pull_request27021
2021-09-15 21:07:12gvanrossumsetmessages: + msg401906
2021-09-14 16:09:35eric.snowsetstage: patch review
pull_requests: + pull_request26748
2021-08-31 20:02:36gvanrossumsetnosy: + gvanrossum
2020-03-18 18:04:53brett.cannonsetnosy: - brett.cannon
2014-06-15 09:22:27lemburgsetmessages: + msg220623
2014-06-14 23:53:07ncoghlansetmessages: + msg220598
2014-06-14 21:57:23eric.snowsetnosy: + eric.snow, brett.cannon, ncoghlan
messages: + msg220586
2014-06-12 16:43:15barrysetmessages: + msg220368
2014-06-12 16:41:28lemburgsetmessages: + msg220367
2014-06-12 16:35:39barrysetmessages: + msg220365
2014-06-12 16:33:04barrysetnosy: + barry
2014-06-12 16:15:21lemburgsetfiles: + __file__-for-frozen-modules.patch
keywords: + patch
2014-06-12 16:14:48lemburgcreate