classification
Title: Recent importlib change breaks most recent certifi == 2020.4.5.2
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: jaraco Nosy List: dstufft, jaraco, lukasz.langa, ned.deily, steve.dower
Priority: release blocker Keywords: patch

Created on 2020-06-09 05:59 by ned.deily, last changed 2020-06-29 21:10 by lukasz.langa. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 20760 merged lukasz.langa, 2020-06-09 12:53
PR 20820 merged jaraco, 2020-06-11 23:39
PR 20857 merged jaraco, 2020-06-13 16:03
Messages (13)
msg371073 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-06-09 05:59
The very recent latest commits for Issue39791, "New `files()` api from importlib_resources", have broken the popular certifi package, a package which provides a basic set of Root Certificates for TLS secure network connection verification. Among other users of it, the python.org macOS installers encourage its users to run a provided script to install certifi.  Alas, we discovered just after v3.9.2b2 was tagged that that script is broken because certifi.where() is returning a bogus path toe the pem file, what appears to be a path in a deleted temp directory.

The culprit commits are 843c27765652e2322011fb3e5d88f4837de38c06 (master) and 9cf1be46e3692d565461afd3afa326d124d743dd (3.9). This is now a critical problem because in its most recent release, certifi 2020.4.5.2, certifi was changed to try to use importlib.resources.path(), if available, to find the path to the installed cacert.pem file. (The previous release, 2020.4.5.1, appears not to use path() and so is not affected by this bug.)

https://github.com/certifi/python-certifi/commit/3fc8fec0466b0f12f10ad3e429b8d915bc5c26fb

https://pypi.org/project/certifi/2020.4.5.2/

Without trying to debug the bug, I was able to bisect the branch and then reduce the problem seen in macOS installer testing to a fairly simple reproducible test case. The problem was reproduced on both Linux and macOS systems.

The test case:

# in a current cpython git repo, checkout the commit before the failing one
git checkout 843c27765652e2322011fb3e5d88f4837de38c06^
git log HEAD^..HEAD
git clean -fdxq
./configure --prefix=$PWD/root -q 
make -j4
./python -m ensurepip
./python -m pip install --upgrade pip # not necessary to reproduce
./python -m pip install --force --no-binary certifi certifi==2020.4.5.2
./python -c 'import certifi;print(certifi.where())'

The output tail should be something like:
[...]
Successfully installed certifi-2020.4.5.2
/home/nad/cpython/root/lib/python3.10/site-packages/certifi/cacert.pem

Now checkout the failing commit and repeat all the other steps:

git checkout 843c27765652e2322011fb3e5d88f4837de38c06
git log HEAD^..HEAD
[...]

The output tail is now incorrect:
[...]
Successfully installed certifi-2020.4.5.2
/tmp/tmpqfjnbj5bcacert.pem

The cacert.pem is installed to the expected (same) location in either case; its just the output from importlib.resources.path that is incorrect:

./python
Python 3.10.0a0 (remotes/upstream/master:0a40849eb9, Jun  9 2020, 00:35:07)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import importlib.resources, os, os.path, certifi
>>> certifi.__file__
'/home/nad/cpython/root/lib/python3.10/site-packages/certifi/__init__.py'
>>> os.listdir(os.path.dirname(certifi.__file__))
['__pycache__', 'cacert.pem', '__init__.py', 'core.py', '__main__.py']
>>> with importlib.resources.path('certifi', 'cacert.pem') as f: print(f)
...
/tmp/tmpxsrjxb8lcacert.pem
>>> with importlib.resources.path('certifi', 'core.py') as f: print(f)
...
/tmp/tmpjq8h3si5core.py

No test suite failures were noted. Perhaps there should be a test case for this?

Presumably any other downstream users of importlib.resources.path() are affected.

Łukasz as 3.9 release manager is aware there is an issue but was awaiting the tracking down of the problem before making a decision about what to do for 3.9.0b2.

cc: Donald as the author of the certifi change.
msg371084 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2020-06-09 08:25
That was a *big* patch to importlib.resources... And newly defined and standardised protocols? That seems like a lot to bring in without having gone through a PEP anyway, despite the minor bug that has drawn attention to it.

I don't think backports should be treated as incubators for standard modules, getting new features that then flow directly into the stdlib. It's fine to have a fork for incubation, but people need to rely on consistency in stdlib and backports.

All that to say, I think we should revert the importlib changes *anyway*, and also because there's something wrong with them right now.
msg371094 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-06-09 12:09
I understand the issue here and can supply more details soon (no later than this weekend).
msg371095 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-09 12:14
In this case we need to revert the 3.9 backport to release a hotfix tonight. Please respect the beta feature freeze.
msg371132 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-09 17:50
New changeset ce5e6f098f8a270e50b989baa75765584573706b by Łukasz Langa in branch '3.9':
[3.9] bpo-40924: Revert "bpo-39791 native hooks for importlib.resources.files (GH-20576)" (#20760)
https://github.com/python/cpython/commit/ce5e6f098f8a270e50b989baa75765584573706b
msg371138 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-06-09 20:29
This issue stems from improper reliance on implementation details of `importlib.resources.path`, which returns a context manager that is designed to clean up the file after the context closes. certifi would have encountered the same problem on older Pythons if certifi had been installed as a zip egg or other non-filesystem-based package.

That said, it is also undesirable for path no longer to return a reference to an existing path when one exists. That behavior has come about as the importlib.resources API moves from the legacy implementation to the new one based on TraversableResources (files function).

I encountered [a similar issue](https://github.com/python/cpython/pull/20576#issuecomment-637881341) during the original submission, which I addressed by removing the same assumption from another library (importlib.metadata).

I believe the best fix here is to restore that assumption while retaining other important changes in this patch. Should this assumed behavior also be tested (guaranteed)? That I'm less sure about.

> Please respect the beta feature freeze.

I do respect the beta feature freeze. The relevant feature was added prior to b1. The reverted change is an incremental fix addressing underlying implementation details such as how resources are resolved and removing duplicate code paths.

More importantly, the change also addresses a [key interface problem](https://github.com/python/cpython/pull/20576#issuecomment-639954228) that I identified in b1 - that the previously advertised interface of `loaders` supplying `files()` methods is inadequate.

I've put a lot of effort into pulling this all together for 3.9 (three full days just this past weekend and hundreds of hours leading up to that). It would be a real shame for it to be released in a broken state due to a minor (though admittedly impactful) hiccup.

I suspect there's a small change that to the submitted patch that will restore the prior expectation and leave the codebase in a healthier state.

I'll prepare that soon.
msg371143 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-09 22:34
> I do respect the beta feature freeze. The relevant feature was added prior to b1. The reverted change is an incremental fix addressing underlying implementation details such as how resources are resolved and removing duplicate code paths.

Using the revert hammer is never an easy call and I've only done that a handful times so far. In this particular case the following helped me make the decision:
- the change was over 2,000 lines big;
- as you say, a non-trivial amount of it was refactoring;
- the change was committed without review;
- the backport was done three weeks after the beta freeze without consulting the RM or anybody else in that matter.

Sure, it's worth fixing the problems you identified after Beta 1, maybe even bring the fixes back to 3.9, but not how this was done this time. We'd like to keep our release cadence stable and avoid hotfixes in the future.

We are all grateful for your work, Jason, and we're happy to have somebody around who is invested in making Python better. To reiterate, I'm less worried about the introduced bug itself, and more about the change management attitude. Yes, it would indeed be a shame if your feature had to wait for another Python release. But risking widespread breakage isn't a good trade off.

It sucks for me to be this boring beaurocrat, believe me, but the purest solution to a deep design issue identified after Beta 1 in a new feature is to... revert that feature altogether and go back to the drawing board for the next release. It's not a theoretical situation. I went through this in the Python 3.8 cycle (see BPO-38242). It sucked but it was the right thing to do.
msg371144 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-06-09 23:35
I feel terrible and really regret that this was able to break things so badly.

What options are available at this point? I'd at the very least like to remove the `loader.files()` behavior to avoid encouraging other loaders to implement it. Should we also revert the `resources.files()` feature altogether, as you suggested?
msg371252 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-10 23:03
I thought about this all day. Given the scope of the change you made I think no option looks ideal. If we revert the entire feature, it's a big bummer for everybody involved and it might be an overreaction. If we keep the feature at the state of Beta 1, it might become a maintenance nightmare for third-party users. If we agree to a the big change you now see necessary, we are reintroducing the same risk that I highlighted in previous comments.

Please note that this is in no way intended to make you feel terrible. Don't beat yourself up about it. Now, think as the core developer you are: if you are 100% sure the change is worth pushing to 3.9, the best course of action at this point seems to be: find a champion within core who will work with you to quickly reassess the situation and give us an action plan as a team. 

Say, Barry seems like the perfect candidate. If such champion shares your opinion about it being important for 3.9, and is willing to work with you on finishing what you need changing (incl. review), *and* to ensure the existing users of importlib.resources don't see any breakage not documented as of Beta 1, then we can talk about an exceptional late inclusion in 3.9.
msg371256 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2020-06-11 00:46
Thanks for the thorough and considerate response.

I do think your original recommendation of reverting the broken feature is the best approach at this point. That is, keep resources.files with the fallback shim and eliminate support for loaders supplying that behavior. That will avoid users relying on that protocol but enable the files feature for Source and Zip importers.

That will buy time for the remaining functionality, mainly the provider interface, to mature and possibly evolve further, for eventual adoption in a future Python release.

Some planning will need to be dialed back, but I don’t have confidence in the implementation to say that it’s the best one. Better to defer that effort.

I’ll put together a patch for 3.9 to remove the loader support (backward incompatible with b1/2 but compatible with 3.8 but only for custom loaders), and put together a hot fix for master so it’s no longer broken.
msg371502 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-14 12:12
New changeset 8a3469047c3c7b68f434ed244ef3ae4292dd8cbc by Jason R. Coombs in branch '3.9':
[3.9] bpo-40924: Remove protocol for supplying Traversable objects from loaders (GH-20820)
https://github.com/python/cpython/commit/8a3469047c3c7b68f434ed244ef3ae4292dd8cbc
msg372631 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-29 20:59
New changeset 2fb5f038f2a2e91a7293d62dfd5601e6eb500c55 by Jason R. Coombs in branch 'master':
bpo-40924: Ensure importlib.resources.path returns an extant path (GH-20857)
https://github.com/python/cpython/commit/2fb5f038f2a2e91a7293d62dfd5601e6eb500c55
msg372634 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-06-29 21:10
This can now be closed. Thank you for all hard work here.
History
Date User Action Args
2020-06-29 21:10:55lukasz.langasetstatus: open -> closed
resolution: fixed
messages: + msg372634

stage: patch review -> resolved
2020-06-29 20:59:29lukasz.langasetmessages: + msg372631
2020-06-14 12:12:24lukasz.langasetmessages: + msg371502
2020-06-13 16:03:38jaracosetpull_requests: + pull_request20049
2020-06-11 23:39:38jaracosetpull_requests: + pull_request20017
2020-06-11 00:46:06jaracosetmessages: + msg371256
2020-06-10 23:03:07lukasz.langasetmessages: + msg371252
2020-06-09 23:35:22jaracosetmessages: + msg371144
stage: needs patch -> patch review
2020-06-09 22:34:32lukasz.langasetmessages: + msg371143
stage: patch review -> needs patch
2020-06-09 20:29:35jaracosetmessages: + msg371138
2020-06-09 17:50:09lukasz.langasetmessages: + msg371132
2020-06-09 12:53:47lukasz.langasetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request19959
2020-06-09 12:14:11lukasz.langasetmessages: + msg371095
2020-06-09 12:09:05jaracosetmessages: + msg371094
2020-06-09 08:25:48steve.dowersetnosy: + steve.dower
messages: + msg371084
2020-06-09 05:59:48ned.deilycreate