This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Freeze the encodings module.
Type: behavior Stage: patch review
Components: Build Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: FFY00, christian.heimes, eric.snow, gvanrossum, kumaraditya, lemburg
Priority: normal Keywords: patch

Created on 2021-10-28 18:11 by eric.snow, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 29331 closed FFY00, 2021-10-30 15:54
PR 29788 open kumaraditya, 2021-11-26 06:54
PR 29814 merged kumaraditya, 2021-11-27 09:32
PR 30030 closed christian.heimes, 2021-12-10 15:26
Messages (8)
msg405211 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-28 18:11
Currently we freeze all the modules imported during runtime initialization, except for the encodings module.  It has a lot of submodules and this results in a lot of extra noise in builds.  We hadn't frozen it yet because we were still ironing out changes related to frozen modules and the extra noise was a pain.  We also waited because we weren't sure if we should freeze all the submodules or just the most likely ones to be used during startup.  In the case of the latter, we were also blocked on having __path__ set on the package.

At this point there are no blockers.  So we should freeze the encodings modules with either all submodules or the most commonly used subset.
msg405213 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-10-28 18:15
encodings is a package. I think you first have to check whether mixing
frozen and non-frozen submodules are even supported. I've never tried
having only part of a package frozen.

Freezing the whole package certainly works.
msg405247 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2021-10-28 19:56
On Thu, Oct 28, 2021 at 12:15 PM Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> encodings is a package. I think you first have to check whether mixing
> frozen and non-frozen submodules are even supported. I've never tried
> having only part of a package frozen.

It works as long as __path__ is set properly, which it is now.  FWIW,
I tested freezing only some of the submodules a while back and it
worked fine.  That was using a different branch that I never merged
but it should be fine with the different change that got merged.  Of
course, we'd need to verify that if we went that route.
msg405372 - (view) Author: Filipe Laíns (FFY00) * (Python triager) Date: 2021-10-30 15:54
I just tested partially freezing the package, and it seems to working fine :)
msg405383 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2021-10-30 18:00
On 30.10.2021 17:54, Filipe Laíns wrote:
> 
> I just tested partially freezing the package, and it seems to working fine :)
FWIW: I think it's best not bother and simply freeze the whole thing.

It's mostly char mappings which compress well and there's a benefit
in sharing these using mmap (which the OS does for you with static
C data).
msg405390 - (view) Author: Filipe Laíns (FFY00) * (Python triager) Date: 2021-10-31 00:43
I have already opened up the PR, but I can change if desired.
msg407323 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-11-29 20:27
New changeset 02b5ac6091ada0c2df99c4e1eae37ddccbcd91f0 by Kumar Aditya in branch 'main':
bpo-45653: fix test_embed on windows (GH-29814)
https://github.com/python/cpython/commit/02b5ac6091ada0c2df99c4e1eae37ddccbcd91f0
msg408474 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2021-12-13 18:01
Eric, I have a simple reproducer for the issue:

This works:

$ LC_ALL=en_US.utf-8 TESTPATH=$(pwd)/Lib:$(pwd)/build/lib.linux-x86_64-3.11 ./Programs/_testembed test_init_setpath_config

This fails because it cannot load ISO-8859-1 / latin-1 codec

$ LC_ALL=en_US.latin1 TESTPATH=$(pwd)/Lib:$(pwd)/build/lib.linux-x86_64-3.11 ./Programs/_testembed test_init_setpath_config
Python path configuration:
  PYTHONHOME = (not set)
  PYTHONPATH = (not set)
  program name = 'conf_program_name'
  isolated = 0
  environment = 1
  user site = 1
  import site = 1
  is in build tree = 0
  stdlib dir = ''
  sys._base_executable = 'conf_executable'
  sys.base_prefix = ''
  sys.base_exec_prefix = ''
  sys.platlibdir = 'lib'
  sys.executable = 'conf_executable'
  sys.prefix = ''
  sys.exec_prefix = ''
  sys.path = [
    '/home/heimes/dev/python/cpython/Lib',
    '/home/heimes/dev/python/cpython/build/lib.linux-x86_64-3.11',
  ]
Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
Python runtime state: core initialized
LookupError: unknown encoding: ISO-8859-1

Current thread 0x00007f9c42be6740 (most recent call first):
  <no Python frame>



With this patch I'm seeing that encodings.__path__ is not absolute and that __spec__ has an empty submodule_search_locations.

--- a/Lib/encodings/__init__.py
+++ b/Lib/encodings/__init__.py
@@ -98,9 +98,12 @@ def search_function(encoding):
             # module with side-effects that is not in the 'encodings' package.
             mod = __import__('encodings.' + modname, fromlist=_import_tail,
                              level=0)
-        except ImportError:
+        except ImportError as e:
             # ImportError may occur because 'encodings.(modname)' does not exist,
             # or because it imports a name that does not exist (see mbcs and oem)
+            sys.stderr.write(f"exception: {e}\n")
+            sys.stderr.write(f"encodings.__path__: {__path__}\n")
+            sys.stderr.write(f"encodings.__spec__: {__spec__}\n")
             pass
         else:
             break


$ LC_ALL=en_US.latin1 TESTPATH=$(pwd)/Lib:$(pwd)/build/lib.linux-x86_64-3.11 ./Programs/_testembed test_init_setpath_config
exception: No module named 'encodings.latin_1'
encodings.__path__: ['encodings']
encodings.__spec__: ModuleSpec(name='encodings', loader=<class '_frozen_importlib.FrozenImporter'>, origin='frozen', submodule_search_locations=[])
exception: No module named 'encodings.iso_8859_1'
encodings.__path__: ['encodings']
encodings.__spec__: ModuleSpec(name='encodings', loader=<class '_frozen_importlib.FrozenImporter'>, origin='frozen', submodule_search_locations=[])


It should have this search location:

>>> import encodings
>>> encodings.__spec__
ModuleSpec(name='encodings', loader=<class '_frozen_importlib.FrozenImporter'>, origin='frozen', submodule_search_locations=['/home/heimes/dev/python/cpython/Lib/encodings'])
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89816
2021-12-13 18:01:38christian.heimessetmessages: + msg408474
2021-12-10 15:26:47christian.heimessetnosy: + christian.heimes
pull_requests: + pull_request28255
2021-11-29 20:27:42gvanrossumsetnosy: + gvanrossum
messages: + msg407323
2021-11-27 09:32:06kumaradityasetpull_requests: + pull_request28047
2021-11-26 06:54:42kumaradityasetnosy: + kumaraditya
pull_requests: + pull_request28024
2021-10-31 00:43:55FFY00setstage: needs patch -> patch review
2021-10-31 00:43:10FFY00setmessages: + msg405390
stage: patch review -> needs patch
2021-10-30 18:00:11lemburgsetmessages: + msg405383
2021-10-30 15:54:48FFY00setkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request27599
2021-10-30 15:54:14FFY00setmessages: + msg405372
2021-10-28 19:56:41eric.snowsetmessages: + msg405247
2021-10-28 18:15:15lemburgsetnosy: + lemburg
messages: + msg405213
2021-10-28 18:11:52eric.snowcreate