Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal Python error: initfsencoding: unable to load the file system codec zipimport.ZipImportError: can't find module 'encodings' #79777

Closed
hyu mannequin opened this issue Dec 27, 2018 · 32 comments
Assignees
Labels
3.7 (EOL) end of life 3.8 only security fixes OS-windows release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@hyu
Copy link
Mannequin

hyu mannequin commented Dec 27, 2018

BPO 35596
Nosy @brettcannon, @pfmoore, @ncoghlan, @vstinner, @tjguk, @ned-deily, @schlamar, @ericsnowcurrently, @zware, @serhiy-storchaka, @zooba, @wwqgtxx, @miss-islington
PRs
  • bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. #11329
  • bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. #11329
  • [3.7] bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329) #11331
  • [3.7] bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329) #11331
  • [3.7] bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329) #11331
  • bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions #11465
  • bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions #11465
  • [3.7] bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions (GH-11465) #11467
  • [3.7] bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions (GH-11465) #11467
  • [3.7] bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions (GH-11465) #11467
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/zooba'
    closed_at = <Date 2019-01-09.04:30:37.167>
    created_at = <Date 2018-12-27.16:18:27.198>
    labels = ['3.7', '3.8', 'OS-windows', 'type-crash', 'release-blocker']
    title = "Fatal Python error: initfsencoding: unable to load the file system codec zipimport.ZipImportError: can't find module 'encodings'"
    updated_at = <Date 2019-01-09.06:47:02.772>
    user = 'https://bugs.python.org/hyu'

    bugs.python.org fields:

    activity = <Date 2019-01-09.06:47:02.772>
    actor = 'serhiy.storchaka'
    assignee = 'steve.dower'
    closed = True
    closed_date = <Date 2019-01-09.04:30:37.167>
    closer = 'ned.deily'
    components = ['Windows']
    creation = <Date 2018-12-27.16:18:27.198>
    creator = 'hyu'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 35596
    keywords = ['patch', 'patch', 'patch', '3.7regression']
    message_count = 32.0
    messages = ['332595', '332610', '332612', '332615', '332616', '332618', '332630', '332654', '332660', '332664', '332674', '332696', '332800', '332816', '332840', '333203', '333204', '333219', '333221', '333250', '333252', '333253', '333254', '333255', '333257', '333259', '333260', '333264', '333266', '333276', '333278', '333290']
    nosy_count = 14.0
    nosy_names = ['brett.cannon', 'paul.moore', 'ncoghlan', 'vstinner', 'tim.golden', 'ned.deily', 'schlamar', 'eric.snow', 'zach.ware', 'serhiy.storchaka', 'steve.dower', 'wwqgtxx', 'miss-islington', 'hyu']
    pr_nums = ['11329', '11329', '11331', '11331', '11331', '11465', '11465', '11467', '11467', '11467']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue35596'
    versions = ['Python 3.7', 'Python 3.8']

    @hyu
    Copy link
    Mannequin Author

    hyu mannequin commented Dec 27, 2018

    python
    Fatal Python error: initfsencoding: unable to load the file system codec
    zipimport.ZipImportError: can't find module 'encodings'

    There are two vcruntime140.dll with no binary diff.
    Date Time Attr Size Compressed Name
    ------------------- ----- -------- ------------ ----------------
    2018-12-10 22:06:34 ..... 80128 45532 vcruntime140.dll
    ...
    2018-12-10 22:06:34 ..... 80128 45532 vcruntime140.dll

    Repeated downloads. Checked both versions:
    https://www.python.org/ftp/python/3.7.2/python-3.7.2-embed-amd64.zip
    https://www.python.org/ftp/python/3.7.2/python-3.7.2-embed-win32.zip

    Searched and read release and doc. Checked bugs since yesterday.

    @hyu hyu mannequin added the 3.7 (EOL) end of life label Dec 27, 2018
    @zooba
    Copy link
    Member

    zooba commented Dec 27, 2018

    Have you tried a proper install as well? Could you do that to rule out any problem on your machine?

    Are you repackaging anything as part of your app, or are you just testing the package first and getting this error?

    It looks like you're running from the directory you extracted to. Is there anything else in that directory or just the Python files?

    When you say there are two vcruntime140.dll, you mean one in each package and they're the same? That might be a problem, but it wouldn't show up like this, so I don't think it's yours. I'm not in A position to check the files right now but I'll get to it later

    @zooba
    Copy link
    Member

    zooba commented Dec 27, 2018

    Okay, this looks like a zipimport issue. When I extract the "python37.zip" file containing the stdlib and reference the directory it works fine. But no matter what I do to the ZIP I can't get it to run.

    It seems that zipimport either can't import .pyc files without a matching .py, or it can't import packages marked with __init__.pyc (I haven't gone deep enough, but adding encodings/init.py got me further).

    This is a regression from 3.7.1.

    Things to do:

    • fix the regression (Serhiy?)
    • add a regression test
    • add a ".pyc-only stdlib in ZIP" test (I'll do this)
    • remove the double vcruntime in ZIP issue (unrelated, so I'll just fix it)

    @zooba zooba added type-crash A hard crash of the interpreter, possibly with a core dump release-blocker labels Dec 27, 2018
    @hyu
    Copy link
    Mannequin Author

    hyu mannequin commented Dec 27, 2018

    Repeated on two clean install Windows hosts.
    No (re)packaging, download and run/start python.
    Repeated with versions 3.7.2, 3.7.1, and 3.6.8:

    https://www.python.org/ftp/python/3.7.2/python-3.7.2-embed-amd64.zip
    https://www.python.org/ftp/python/3.7.1/python-3.7.1-embed-amd64.zip
    https://www.python.org/ftp/python/3.6.8/python-3.6.8-embed-amd64.zip

    Windows Explorer properly extracted: \tmp\py372, \tmp\py371, \tmp\py368.
    Python 3.6.8 and 3.7.1 properly started, executed import sys; sys.exit()
    Python 3.7.2 failed to start. Please suggest proper commands if you claim these are not proper Windows commands.

    Worked extra to show both 3.6 and 3.7 regressions. If you want to claim copying 3.6.8 vcruntime140.dll to 3.7.1 as (re)packaging, then ignore v3.7.1:260ec2c36a below.

    Windows Explorer shows and 7-zip lists two vcruntime140.dll in 3.7.2. Please ignore 7-zip if you claim that is not proper or (re)package tool and I will attach Windows Explorer screen shot.

    Microsoft Windows [Version 10.0.17763.195]
    (c) 2018 Microsoft Corporation. All rights reserved.

    C:\>\tmp\py368\python
    Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
    >>> import sys; sys.exit()

    C:\>\tmp\py372\python
    Fatal Python error: initfsencoding: unable to load the file system codec
    zipimport.ZipImportError: can't find module 'encodings'

    Current thread 0x00002614 (most recent call first):

    C:\>copy \tmp\py368\vcruntime140.dll \tmp\py371\
    1 file(s) copied.

    C:\>\tmp\py371\python
    Python 3.7.1 (v3.7.1:260ec2c36a, Oct 20 2018, 14:57:15) [MSC v.1915 64 bit (AMD64)] on win32
    >>> import sys; sys.exit()

    C:\>

    @hyu hyu mannequin removed the type-crash A hard crash of the interpreter, possibly with a core dump label Dec 27, 2018
    @zooba
    Copy link
    Member

    zooba commented Dec 27, 2018

    Thanks for the extra info, and for confirming that 3.6.8 isn't affected (I hadn't tried that you, so you saved me some work :) )

    This is definitely a new zipimport regression in 3.7.2. Thanks for the report.

    @zooba zooba added the 3.8 only security fixes label Dec 27, 2018
    @miss-islington
    Copy link
    Contributor

    New changeset 59c2aa2 by Miss Islington (bot) (Steve Dower) in branch 'master':
    bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329)
    59c2aa2

    @eryksun eryksun added the type-crash A hard crash of the interpreter, possibly with a core dump label Dec 27, 2018
    @miss-islington
    Copy link
    Contributor

    New changeset bbf6954 by Miss Islington (bot) in branch '3.7':
    bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329)
    bbf6954

    @serhiy-storchaka
    Copy link
    Member

    There were no changes in zipimport between 3.7.1 and 3.7.2, and there were just few looking unrelated changes in the import machinery. Maybe this is caused by some changes in the interpreter initialization code?

    @miss-islington
    Copy link
    Contributor

    New changeset bbf6954 by Miss Islington (bot) in branch '3.7':
    bpo-35596: Fix vcruntime140.dll being added to embeddable distro multiple times. (GH-11329)
    bbf6954

    @ncoghlan
    Copy link
    Contributor

    Reviewing the diff at v3.7.1...v3.7.2 the only item I've spotted that seems like it could even plausibly be related is the tweak at v3.7.1...v3.7.2#diff-baf5eab51059d96fb8837152dab0d1a4

    (Click the Files tab to get your browser to jump to the anchor in the second link)

    That's a change to the function that emits the "Fatal Python error: initfsencoding: unable to load the file system codec" message.

    That change means that embedding applications could potentially be hitting the codec name resolution at https://github.com/python/cpython/blob/3.7/Python/pylifecycle.c#L1643 with the filesystem encoding set as "ascii", rather than handling that case through the "get_locale_encoding()" branch, which does the initial codec name lookup with the filesystem encoding still set to NULL (and hence falling back to the locale encoding as the default).

    However, the only way that new branch could trigger is if check_force_ascii() (at https://github.com/python/cpython/blob/v3.7.2/Python/fileutils.c#L100 ) is returning 1 for some reason, which we only expect it to do on some misbehaving BSD OSes, not on Windows: #10233

    @zooba
    Copy link
    Member

    zooba commented Dec 28, 2018

    None of the code you linked is defined on Windows at all, so it can't be that.

    Are any stat checks done when there's only a .pyc to import? Could it be deciding that the .pyc is out of date and then failing to find source?

    @zooba
    Copy link
    Member

    zooba commented Dec 29, 2018

    I took a closer look at the diff since 3.7.1, and I'm not seeing anything either. I suspect we need to step through zipimport/importlib and figure out exactly where it rejects the .pyc files in the zip.

    @ncoghlan
    Copy link
    Contributor

    Ah, you're right - I missed that the ForceASCII stuff was on the non-Windows side of an ifdef so it's literally impossible for that change to affect Windows, not just highly unlikely.

    It would be interesting to compare the output of python -vv between the working case and the non-working case, as the second level of verbosity will print out all the different candidates the two versions are considering, and which ones they're accepting. For example, here's my Linux system Python up to the point where it finishes importing the UTF-8 codec:

    ========================

    $ python3 -vv
    import _frozen_importlib # frozen
    import _imp # builtin
    import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
    import '_warnings' # <class '_frozen_importlib.BuiltinImporter'>
    import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
    # installing zipimport hook
    import 'zipimport' # <class '_frozen_importlib.BuiltinImporter'>
    # installed zipimport hook
    import '_frozen_importlib_external' # <class '_frozen_importlib.FrozenImporter'>
    import '_io' # <class '_frozen_importlib.BuiltinImporter'>
    import 'marshal' # <class '_frozen_importlib.BuiltinImporter'>
    import 'posix' # <class '_frozen_importlib.BuiltinImporter'>
    import _thread # previously loaded ('_thread')
    import '_thread' # <class '_frozen_importlib.BuiltinImporter'>
    import _weakref # previously loaded ('_weakref')
    import '_weakref' # <class '_frozen_importlib.BuiltinImporter'>
    # /usr/lib64/python3.7/encodings/__pycache__/__init__.cpython-37.pyc matches /usr/lib64/python3.7/encodings/__init__.py
    # code object from '/usr/lib64/python3.7/encodings/__pycache__/__init__.cpython-37.pyc'
    # trying /usr/lib64/python3.7/codecs.cpython-37m-x86_64-linux-gnu.so
    # trying /usr/lib64/python3.7/codecs.abi3.so
    # trying /usr/lib64/python3.7/codecs.so
    # trying /usr/lib64/python3.7/codecs.py
    # /usr/lib64/python3.7/__pycache__/codecs.cpython-37.pyc matches /usr/lib64/python3.7/codecs.py
    # code object from '/usr/lib64/python3.7/__pycache__/codecs.cpython-37.pyc'
    import '_codecs' # <class '_frozen_importlib.BuiltinImporter'>
    import 'codecs' # <_frozen_importlib_external.SourceFileLoader object at 0x7f0ea616eb70>
    # trying /usr/lib64/python3.7/encodings/aliases.cpython-37m-x86_64-linux-gnu.so
    # trying /usr/lib64/python3.7/encodings/aliases.abi3.so
    # trying /usr/lib64/python3.7/encodings/aliases.so
    # trying /usr/lib64/python3.7/encodings/aliases.py
    # /usr/lib64/python3.7/encodings/__pycache__/aliases.cpython-37.pyc matches /usr/lib64/python3.7/encodings/aliases.py
    # code object from '/usr/lib64/python3.7/encodings/__pycache__/aliases.cpython-37.pyc'
    import 'encodings.aliases' # <_frozen_importlib_external.SourceFileLoader object at 0x7f0ea6183550>
    import 'encodings' # <_frozen_importlib_external.SourceFileLoader object at 0x7f0ea616e5c0>
    # trying /usr/lib64/python3.7/encodings/utf_8.cpython-37m-x86_64-linux-gnu.so
    # trying /usr/lib64/python3.7/encodings/utf_8.abi3.so
    # trying /usr/lib64/python3.7/encodings/utf_8.so
    # trying /usr/lib64/python3.7/encodings/utf_8.py
    # /usr/lib64/python3.7/encodings/__pycache__/utf_8.cpython-37.pyc matches /usr/lib64/python3.7/encodings/utf_8.py
    # code object from '/usr/lib64/python3.7/encodings/__pycache__/utf_8.cpython-37.pyc'
    import 'encodings.utf_8' # <_frozen_importlib_external.SourceFileLoader object at 0x7f0ea6191278>

    ========================

    @wwqgtxx
    Copy link
    Mannequin

    wwqgtxx mannequin commented Dec 31, 2018

    I have tried zipping the stdlib myself form normal version's "Python37\Lib" with all files were end with ".py"(without "site-packages" of course). And then everything work fine. Maybe the loader only reject ".pyc" file from zip load?

    @zooba
    Copy link
    Member

    zooba commented Dec 31, 2018

    Yes, we've established that zipimport is rejecting .pyc files now, but we need to dig through it to figure out why. I haven't had time yet, but if someone else can then don't wait for me.

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    Looks like zipimport in 3.7 always rejected CHECKED_HASH pycs, while in 3.8 it always accepts them (or runs it through a validation process that passes them when the source file doesn't exist - I only confirmed by testing a build, not by walking through the new sources).

    Rather than changing the old zipimport now, it's more correct to fix the embeddable ZIP build to specify UNCHECKED_HASH.

    @zooba zooba self-assigned this Jan 8, 2019
    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    And I assume now that the reason it broke in 3.7.2 is because the pyc mode for the embeddable distro changed. Which means the right place for tests is in a separate build that uses properly laid out Python rather than testing in the source tree (like what I have in the windows-appx-tests.yml file and Tools/msi/testrelease.bat script, but apparently also for the embeddable distro).

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    New changeset 872bd2b by Steve Dower in branch 'master':
    bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions (GH-11465)
    872bd2b

    @miss-islington
    Copy link
    Contributor

    New changeset 69f64b6 by Miss Islington (bot) in branch '3.7':
    bpo-35596: Use unchecked PYCs for the embeddable distro to avoid zipimport restrictions (GH-11465)
    69f64b6

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    This is now resolved, and only through modifying the build scripts. Which means I can take the existing build and republish a fixed embeddable package without needing a new release.

    Unless Ned would prefer a complete release?

    @ned-deily
    Copy link
    Member

    It seems like this need not trigger a complete new release and, ATM, I'm not aware of any other showstopper problems that would otherwise trigger an early 3.7.3. One question would be how and where to document this change in the build artificat. Suggestions, Steve?

    @vstinner
    Copy link
    Member

    vstinner commented Jan 8, 2019

    This is now resolved, and only through modifying the build scripts. Which means I can take the existing build and republish a fixed embeddable package without needing a new release.

    Since Python itself doesn't make, I'm ok to not change the Python release. But for pratical issues, would it be possible to use a different *filename*? For example, Python website rely a lot on CDN caching. It can be surprising to have two files with the same name but different content.

    @ned-deily
    Copy link
    Member

    CDN caching on python.org is not a problem; we know how to clear out the cache. But I also strongly dislike silent updates of released files so I agree that names should be changed if we do end up agreeing to replace one or more files.

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    I know how to purge the CDN cache, so that's not an issue. And there's no good reason to leave the old one up.

    Perhaps we can just add a note to the download page and I'll post on a couple of lists? This is basically a product recall, and those are usually advertised at the point of sale.

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    I can add ".post1" to the version number in the file name, but I'd still want to take down the broken one. And anyone who's generating the download URL will get a broken link, which IMO is just as bad as a broken download when we could fix it.

    @ned-deily
    Copy link
    Member

    I think we should change the name (post1 is fine), delete the original file, update the file name link in the release page (https://www.python.org/downloads/release/python-372/) to use the new name, and add a sentence or two to the release page describing the change. If you could write up something for the page, I can add it and change the file name when ready.

    @serhiy-storchaka
    Copy link
    Member

    It would be weird if building from sources will not give the same distribution as downloaded from official site. It would be not fair to alternate distributors.

    I think this is a time to release 3.7.2.1. This would be not the first time of using the fourth number in the version.

    @ned-deily
    Copy link
    Member

    It would be weird if building from sources will not give the same distribution as downloaded from official site. It would be not fair to alternate distributors.

    Yes, I agree with that in general but, as I understand it, the change here affects only how the Windows embeddable distribution is packaged. I don't think we expect alternate distributors to produce such distributions - or do we know of such cases? And, even if so, it's not a big deal for a third-party to pick up the change. There are parts of the PC and Mac source tree that really are intended only for building of python.org binary releases. If the changes affected the python executables or standard library files, that would be a very different matter. It is a trade-off; I just don't think that this is the type of change that needs to trigger a new release cycle and I don't want to go down the path of creating a new level of release. When was the last time we had a 3.x.y.z? I don't recall one.

    @zooba
    Copy link
    Member

    zooba commented Jan 8, 2019

    Agreed. My plan is to just replace the precompiled ZIP file of the standard library in the embeddable package with one with PYCs missing the "check source" bit that the old zipimport rejects. It's as simple as a 1 line change in a supporting script in PC/layout (though the actual change I made is more significant to support other use cases).

    The binary and library sources are so identical this doesn't even require a rebuild. And anyone building their own distro from source using this script will hit the issue and find this bug. The only reason I missed it was because I tested against master, not realising that the new zipimport changed behaviour here. Nobody else will be blindly releasing these packages with only tests against an incompatible versions the way we do (and now I have tests).

    @zooba
    Copy link
    Member

    zooba commented Jan 9, 2019

    I've updated the files and sent Ned the info needed to confirm and update the download page.

    @ned-deily
    Copy link
    Member

    Thanks, Steve. The download page for 3.7.2 has now been updated with the URLs for the modified embeddable files, the CDN caches updated, and the original embeddable download files and their GPG signature files are no longer accessible so references to the original URLs will result in hard failures. I considered updating the 3.7.2 blog announcement but decided against it as likely adding more confusion than it was worth. There just aren't that many users yet of the embeddable files.

    @serhiy-storchaka
    Copy link
    Member

    When was the last time we had a 3.x.y.z? I don't recall one.

    My apologies, it seems my memory has tricked me. I thought there was on in the 3.3 branch, but it was just that the third number was bumped again just a month ago after a bugfix release.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes OS-windows release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants