This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: importlib: _find_and_load() race condition on sys.modules[name] check
Type: behavior Stage: resolved
Components: Tests, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: barry, brett.cannon, eric.snow, gvanrossum, lukasz.langa, ncoghlan, paul.moore, serhiy.storchaka, steve.dower, tim.golden, twouters, vstinner, zach.ware
Priority: normal Keywords: patch

Created on 2017-07-10 09:44 by vstinner, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
importlib_sleep.patch vstinner, 2017-07-10 13:59
Pull Requests
URL Status Linked Edit
PR 2645 closed serhiy.storchaka, 2017-07-10 15:13
PR 2646 merged vstinner, 2017-07-10 16:12
PR 2651 merged vstinner, 2017-07-10 20:56
PR 2665 merged vstinner, 2017-07-11 08:03
PR 2801 merged vstinner, 2017-07-21 11:05
Messages (35)
msg298044 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 09:44
1:36:32 [224/405/1] test_import failed -- running: test_mmap (2236 sec), test_io (2529 sec), test_venv (1437 sec)
beginning 6 repetitions
123456
.....test test_import failed -- Traceback (most recent call last):
  File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 377, in test_concurrency
    raise exc
  File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 362, in run
    import package
  File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\data\package\__init__.py", line 2, in <module>
    package.submodule
AttributeError: module 'package' has no attribute 'submodule'
msg298046 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 09:56
I was able to *easily* (100% failure rate) reproduce the failure with Python 3.6 and 3.7 (master) on Windows using this command:

./python -m test -R 3:100 -m test_concurrency test_import
msg298047 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 09:59
I don't understand if a test_concurrency() failure is a bug in importlib, or a bug in the test :-)
msg298048 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 10:02
Note: "./python -m test -R 3:100 -m test_concurrency test_import" is just a convenient way to run test_concurrency() "in a loop", in the same process, 100 times.

I'm unable to reproduce the bug on Linux, even when I write 3 into /proc/sys/vm/drop_caches while the test is running to drop filesystem caches.

I also tried to use a temporary directory in my home (TMPDIR=$HOME/tmp) to use temporary files in ext4 rather than tmpfs. But I don't know if importlib uses temporary files or not :-)
msg298049 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 10:04
Oh, the unit test was added 4 days ago:

commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0
Author: Serhiy Storchaka <storchaka@gmail.com>
Date:   Thu Jul 6 08:09:03 2017 +0300

    bpo-30814: Fixed a race condition when import a submodule from a package. (#2580)
msg298053 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-10 10:40
Indeed, Serhiy's been stress-testing some bits of the import system, and finding a few holes made possible by the shift away from a single global import lock to per-module locks. (Issue 30876 is another example of that)

Nothing immediately comes to mind as a platform-dependent code path that could explain why that error is showing up on Windows and not on Linux, though.
msg298058 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-10 12:02
Some parts of import machinery are platform depended (see issue30873).

I don't have idea what is wrong on Windows.

I'm puzzled. test_concurrency itself runs a test in a loop. The only difference between one and several consequent runs of test_concurrency is running setUp() and tearDown() between tests. Victor, try to add importlib.invalidate_caches() inside a loop. Does it cause a failure in a single test_concurrency?
msg298063 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 13:09
> Victor, try to add importlib.invalidate_caches() inside a loop. Does it cause a failure in a single test_concurrency?

I added a "importlib.invalidate_caches()" call in the "for i in range(10):" loop. With this change, a single run doesn't fail. It doesn't fix the test neither: with -R 3:100, the command still fails.

> Some parts of import machinery are platform depended (see issue30873).

I don't think that this issue30873 or issue30626 is platform dependent.


> I don't have idea what is wrong on Windows.

IMHO it's just that my Windows VM is slow. It's a race condition. If I run "./python -m test -m test_concurrency test_import test_import": the command pass, but when again, it fails. It's pure random :-)
msg298064 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 13:59
I added print(id(package)) in Lib/test/test_import/data/package/__init__.py: when the bug occurs on my Windows VM, I see two print with two different identifiers. Moreover, when the bug occurs, first id(sys.modules['package']) is the first identifier, and then package is replaced with the new 'package' package in sys.modules (new identifier).

I don't understand something: sys.modules isn't supposed to be protected by a lock? I checked if _installed_safely.__enter__() if the _imp is held: it isn't held when _bootstrap._setup() is called, nor when _imp._install_external_importers() is called. But even if I patch these two functions to acquire the lock, Python startup fails on importing a codec module, because the _imp lock is not held in _installed_safely.__enter__(). So maybe sys.modules dict is supposed to be protected by a different lock, maybe indirectly?

I hack _bootstrap._find_and_load_unlocked() to sleep 10 ms if name == 'package'. With this change, test_concurrency() fails immediatly on my Linux box.

* Apply attached importlib_sleep.patch (written for the current master branch)
* Recompile: make regen-importlib && make
* Run test_concurrency() just once: ./python -m test -m test_concurrency -v test_import
* You get the bug

By the way, importlib_sleep.patch changes test_concurrency() to run the test exactly once (it changes to loop to a single iteration, instead of 10).

Is it normal that a sleep makes import failing? Imports are not supposed to be atomic?

--

I tried to similar hack before the commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0:

* add a sleep in importlib
* copy test_concurrency() from master
* Run test_concurrency()
* The test pass

While I'm not sure of my hack, it *seems* like sys.modules was somehow atomic before this commit?
msg298066 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 14:25
FYI test_concurrency() added by commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0 doesn't really test bpo-30814. I backported the unit test on 1ccbad9c95cf5782a1329eeaecba6e3eb0c37cb8 (commit before b4baacee1adc06edbe30ac7574d17a8cd168e2e0), and the test pass, whereas I expected it to fail before the fix.
msg298067 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-10 14:44
No, there's no global lock on the import system these days.

Instead, imports are supposed to be atomic on a per-module basis, but issue 30814 showed that some of the accesses to the shared state management (which *is* still supposed to be protected by the import lock) could get into trouble given concurrent imports from different threads.

However, the new test exercises an interesting sequence of per-module lock acquisition, since each thread effectively runs through this sequence:

- import package # Test case
- import package.submodule # Package init
- import package # Parent package import

That moves back-and-forth between C & Python in a few different places, and hence may have been relying on the GIL to keep the parts executed in C atomic.

So now I'm wondering whether or not this new problem might be due to moving the _imp.acquire_lock() calls into Python code as part of issue 30814, as that creates a window near the start of _find_and_load() where the thread may end up holding neither the GIL *nor* the import lock. By contrast, when the import lock was acquired in C, then by the time the Python code starts running (potentially releasing the GIL), the import lock has already been acquired.
msg298068 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 14:47
> FYI test_concurrency() added by commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0 doesn't really test bpo-30814. I backported the unit test on 1ccbad9c95cf5782a1329eeaecba6e3eb0c37cb8 (commit before b4baacee1adc06edbe30ac7574d17a8cd168e2e0), and the test pass, whereas I expected it to fail before the fix.

Oops, please ignore this comment, it was a mistake. After clearing my repository (removing __pycache__ directories), the test failed. I forgot to copy the Lib/test/test_import/data/package/ directory.
msg298069 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-10 14:52
test_concurrency is failed before b4baacee1adc06edbe30ac7574d17a8cd168e2e0 on my computer. I think it tests the issue30814 bug even if it doesn't fail on all computers. At the end, this is a race condition, and the failure is probabilistic.
msg298070 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 15:09
Using attached importlib_sleep.patch, you can *see* the race condition with:

diff --git a/Lib/test/test_import/data/package/__init__.py b/Lib/test/test_import/data/package/__init__.py
index a4f2bc3..f07eb11 100644
--- a/Lib/test/test_import/data/package/__init__.py
+++ b/Lib/test/test_import/data/package/__init__.py
@@ -1,2 +1,3 @@
 import package.submodule
+print("package: id=%#x" % id(package))
 package.submodule


Example of output:
---
haypo@selma$ ./python -m test -m test_concurrency -v test_import
package: id=0x7fb497400d58
package: id=0x7fb49744d458
test test_import failed -- (...)
---

Lib/test/test_import/data/package/__init__.py is run twice instead of being run once. A lock is missing somewhere on the package module.
msg298071 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-10 15:17
Victor tested test_concurrency on 3.5, and it is failed on slow Windows VM. Therefore the bug exists on 3.5 and is not related to recent changes.
msg298072 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-07-10 15:28
I'm about to head to bed so I'll get back to this discussion tomorrow, but in the meantime: Victor, maybe it would make sense to reformulate your reproducer as a monkeypatch for _find_and_load_unlocked() that injects the 10 ms sleep from the test case if it finds _frozen_importlib in sys.modules?

That is, temporarily replacing it with something like:

    orig_find_and_load_unlocked = _frozen_importlib._find_and_load_unlocked
    def _slow_find_and_load_unlocked(name, import_):
        time.sleep(0.010)
        return orig_find_and_load_unlocked(name, import_)

That way, we'd hopefully be able to reproduce the failure independently of the slower buildbot.
msg298081 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 20:52
New changeset 4f9a446f3fb42f800e73cd9414dd1eccb3ca4fa7 by Victor Stinner in branch 'master':
bpo-30891: Fix importlib _find_and_load() race condition (#2646)
https://github.com/python/cpython/commit/4f9a446f3fb42f800e73cd9414dd1eccb3ca4fa7
msg298087 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 21:16
New changeset fe6e686c27ce3a3ecdf03803cff7e230dee5530d by Victor Stinner in branch '3.6':
bpo-30891: Fix importlib _find_and_load() race condition (#2646) (#2651)
https://github.com/python/cpython/commit/fe6e686c27ce3a3ecdf03803cff7e230dee5530d
msg298088 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-10 21:23
Backporting the fix ot Python 3.5 doesn't seem trivial, 3.5 lacks many changes.
msg298798 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-21 11:00
New changeset e72b1359f81d1dd42bd8a5c5cc2b3928b74f8023 by Victor Stinner in branch 'master':
bpo-30891: Fix again importlib _find_and_load() (#2665)
https://github.com/python/cpython/commit/e72b1359f81d1dd42bd8a5c5cc2b3928b74f8023
msg298801 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-21 11:28
New changeset bb323b261d0f41ad6998d3a16c41b29b3bb8e982 by Victor Stinner in branch '3.6':
bpo-30891: Fix again importlib _find_and_load() (#2665) (#2801)
https://github.com/python/cpython/commit/bb323b261d0f41ad6998d3a16c41b29b3bb8e982
msg299184 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-26 02:32
With Python 3.5.4rc1 released (3.5.4 will be the last 3.5 release accepting bug fixes), I'm not sure that it's still worth it to try to backport such changes to Python 3.5. The importlib is very important, any regression can be very important.

I'm not sure that backporting the unit test is interesting. If the test fails, we will be unable to fix it.
msg299305 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-27 10:25
Victor:
> I'm not sure that backporting the unit test is interesting. If the test fails, we will be unable to fix it.

Serhiy closed his PR:
https://github.com/python/cpython/pull/2645#event-1181495893

So now I close this issue.
msg302106 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2017-09-13 18:49
If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport.
msg302124 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-09-13 22:03
Łukasz Langa: "If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport."

Hum, do you mean that zipimport also has an issue? Maybe open a new issue?
msg314594 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-03-28 14:27
Yeah, what's wrong with zipimport?
msg314615 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-03-28 17:27
There's a bunch of things wrong with zipimport, just ask Thomas. ;)
msg314818 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-04-02 17:46
In the context of this issue: zipimport doesn't properly lock on imports happening in multiple threads at the same time. This causes one thread to receive a partially imported module while the other thread is still performing the import.

zipimport also doesn't properly check __pycache__ directories but *does* look for old-style .pyc files next to the corresponding .py file.

IIRC Brett chooses not to deal with zipimport's warts and has no time to rewrite it in Python. Since the rest of the core team is less qualified, they steer clear, too.

So, don't use zipimport.
msg314819 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-04-02 17:57
That's rather disturbing, and none of that is reflected in the zipimport docs. Is there at least a bpo issue about it? I may be the last person in the world to still recommend putting a zipfile on sys.path -- I used to believe it's faster than directory searches, is that no longer the case?
msg314879 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-04-03 13:32
I recommend and use it without trouble, but then I rarely defer imports to worker threads.

Barry may have some ideas, but I think this really shouldn't be difficult to do. importlib is pretty extensible, and it's not difficult to read from a zip file.
msg314887 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-03 16:56
The other problem is zipimport is written in C and no one wants to deal with that headache if possible. ;) We actually tweaked the importlib.resources API specifically to avoid having to muck with zipimport's C code as it would have required major surgery otherwise. Thomas has also been saying he was planning to upstream the improvements he has made to zipimport internally at Google which also demotivates touching the code knowing that changes are planned.

As for Steve's comment, the tricky bit is zipfile has a bunch of dependencies, so it's not so simple to just freeze that one module  and wrap it with importlib to be able to load the stdlib from a zip file (which zipimport supports). Basically zip-reading code needs to be written that can be bootstrapped (C or Python), and then use that in importlib to be able to import the stdlib from a zip file.

I have also been toying with the idea of a zippath module that would take pathlib's API and apply it to zip files acting like a self-contained file system, but I have never bothered as zipfile's API is messy. The "all the time in the world" project would be making all of this come together in a release of Python by building off of each other: new code to read zip files written in freezable Python, use importlib to replace zipimport, and provide a pathlib API to work with zip files (which would probably be partially built for use by importlib).

If I drop my idea of putting a package on PyPI to do lazy importing I could look at tackling this for 3.8.
msg314889 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2018-04-03 20:14
Take a look at https://bugs.python.org/issue25711 which wants to rewrite zipimport from scratch.  I took Serhiy's C->Python patch and turned it into a PR, but it would need a lot of love to get it into shape.

(Let's not even talk about https://bugs.python.org/issue32866 which caused me some pain during importlib.resources work.)

Pretty sure Thomas's rumored rewrite is still in C, and it solves problems like support for zip64 and comments, which we've run into also.

zipimport is crucial for single-file executable solutions like pex and our own soon-to-be-open-sourced tool, so it really can't go away.  Both of those still unpack wheels/eggs/sdists to a file system cache, not so much for speed but because shared libraries generally can't be imported from anything but a physical path (thanks to dlopen()).  I think Google has a patch for glibc that would solve that but there's some hesitation from the glibc maintainers to merge it (and then what about other platforms?).
msg314894 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-04-03 20:45
OK, this sounds much better: zipimport has problems and we should fix them,
presumably by getting Thomas's fixes (sans Google-isms, which may require
some effort). I don't believe Łukasz's recommendation "don't use it" is the
right attitude (though what he puts in Facebook's internal style guide is
his problem :-), and I don't believe that we're better off deprecating it.
msg314948 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-04-04 18:55
And mostly for completeness, I know Thomas has maligned the fact that zipimport has tests for some crazy things in its semantics which made it difficult for him to improve the module.
msg314949 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-04-04 19:19
If there are overspecified tests we can debate them of course.

On Wed, Apr 4, 2018, 11:55 Brett Cannon <report@bugs.python.org> wrote:

>
> Brett Cannon <brett@python.org> added the comment:
>
> And mostly for completeness, I know Thomas has maligned the fact that
> zipimport has tests for some crazy things in its semantics which made it
> difficult for him to improve the module.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue30891>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:58:48adminsetgithub: 75074
2018-04-04 19:19:31gvanrossumsetmessages: + msg314949
2018-04-04 18:55:15brett.cannonsetmessages: + msg314948
2018-04-03 20:45:44gvanrossumsetmessages: + msg314894
2018-04-03 20:14:18barrysetmessages: + msg314889
2018-04-03 16:56:19brett.cannonsetmessages: + msg314887
2018-04-03 13:32:33steve.dowersetnosy: + barry
messages: + msg314879
2018-04-02 17:57:03gvanrossumsetmessages: + msg314819
2018-04-02 17:46:21lukasz.langasetmessages: + msg314818
2018-03-28 17:27:34brett.cannonsetnosy: + twouters
messages: + msg314615
2018-03-28 14:27:03gvanrossumsetnosy: + gvanrossum
messages: + msg314594
2017-09-13 22:03:22vstinnersetmessages: + msg302124
2017-09-13 18:49:44lukasz.langasetnosy: + lukasz.langa
messages: + msg302106
2017-07-27 10:25:47vstinnersetstatus: open -> closed
versions: - Python 3.5
messages: + msg299305

resolution: fixed
stage: needs patch -> resolved
2017-07-26 02:32:34vstinnersetmessages: + msg299184
2017-07-21 11:28:58vstinnersetmessages: + msg298801
2017-07-21 11:05:08vstinnersetpull_requests: + pull_request2852
2017-07-21 11:00:48vstinnersetmessages: + msg298798
2017-07-11 19:53:47vstinnersetpull_requests: - pull_request2734
2017-07-11 16:27:05cooperleessetpull_requests: + pull_request2734
2017-07-11 08:03:34vstinnersetpull_requests: + pull_request2730
2017-07-10 21:23:57vstinnersetmessages: + msg298088
2017-07-10 21:16:29vstinnersetmessages: + msg298087
2017-07-10 20:56:36vstinnersetpull_requests: + pull_request2717
2017-07-10 20:52:34vstinnersetmessages: + msg298081
2017-07-10 19:11:26brett.cannonlinkissue30893 dependencies
2017-07-10 16:24:05vstinnersettitle: Sometimes, test_concurrency() of test_import fails with AttributeError: module 'package' has no attribute 'submodule' on AMD64 Windows8.1 Refleaks 3.6 -> importlib: _find_and_load() race condition on sys.modules[name] check
versions: + Python 3.5, Python 3.6
2017-07-10 16:12:05vstinnersetpull_requests: + pull_request2712
2017-07-10 15:28:52ncoghlansetmessages: + msg298072
2017-07-10 15:17:44serhiy.storchakasetmessages: + msg298071
2017-07-10 15:13:34serhiy.storchakasetpull_requests: + pull_request2711
2017-07-10 15:09:27vstinnersetmessages: + msg298070
2017-07-10 14:52:56serhiy.storchakasetmessages: + msg298069
2017-07-10 14:47:33vstinnersetmessages: + msg298068
2017-07-10 14:44:11ncoghlansetmessages: + msg298067
2017-07-10 14:25:32vstinnersetmessages: + msg298066
2017-07-10 13:59:46vstinnersetfiles: + importlib_sleep.patch
keywords: + patch
messages: + msg298064
2017-07-10 13:09:31vstinnersetmessages: + msg298063
2017-07-10 12:02:13serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg298058
2017-07-10 10:40:24ncoghlansettype: behavior
messages: + msg298053
stage: needs patch
2017-07-10 10:05:00vstinnersetmessages: + msg298049
2017-07-10 10:02:10vstinnersetmessages: + msg298048
2017-07-10 09:59:54vstinnersetnosy: + brett.cannon, ncoghlan, eric.snow
messages: + msg298047
2017-07-10 09:56:36vstinnersetmessages: + msg298046
2017-07-10 09:44:24vstinnercreate