msg298044 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 09:44 |
1:36:32 [224/405/1] test_import failed -- running: test_mmap (2236 sec), test_io (2529 sec), test_venv (1437 sec)
beginning 6 repetitions
123456
.....test test_import failed -- Traceback (most recent call last):
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 377, in test_concurrency
raise exc
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\__init__.py", line 362, in run
import package
File "D:\buildarea\3.6.ware-win81-release.refleak\build\lib\test\test_import\data\package\__init__.py", line 2, in <module>
package.submodule
AttributeError: module 'package' has no attribute 'submodule'
|
msg298046 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 09:56 |
I was able to *easily* (100% failure rate) reproduce the failure with Python 3.6 and 3.7 (master) on Windows using this command:
./python -m test -R 3:100 -m test_concurrency test_import
|
msg298047 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 09:59 |
I don't understand if a test_concurrency() failure is a bug in importlib, or a bug in the test :-)
|
msg298048 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 10:02 |
Note: "./python -m test -R 3:100 -m test_concurrency test_import" is just a convenient way to run test_concurrency() "in a loop", in the same process, 100 times.
I'm unable to reproduce the bug on Linux, even when I write 3 into /proc/sys/vm/drop_caches while the test is running to drop filesystem caches.
I also tried to use a temporary directory in my home (TMPDIR=$HOME/tmp) to use temporary files in ext4 rather than tmpfs. But I don't know if importlib uses temporary files or not :-)
|
msg298049 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 10:04 |
Oh, the unit test was added 4 days ago:
commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0
Author: Serhiy Storchaka <storchaka@gmail.com>
Date: Thu Jul 6 08:09:03 2017 +0300
bpo-30814: Fixed a race condition when import a submodule from a package. (#2580)
|
msg298053 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2017-07-10 10:40 |
Indeed, Serhiy's been stress-testing some bits of the import system, and finding a few holes made possible by the shift away from a single global import lock to per-module locks. (Issue 30876 is another example of that)
Nothing immediately comes to mind as a platform-dependent code path that could explain why that error is showing up on Windows and not on Linux, though.
|
msg298058 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-07-10 12:02 |
Some parts of import machinery are platform depended (see issue30873).
I don't have idea what is wrong on Windows.
I'm puzzled. test_concurrency itself runs a test in a loop. The only difference between one and several consequent runs of test_concurrency is running setUp() and tearDown() between tests. Victor, try to add importlib.invalidate_caches() inside a loop. Does it cause a failure in a single test_concurrency?
|
msg298063 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 13:09 |
> Victor, try to add importlib.invalidate_caches() inside a loop. Does it cause a failure in a single test_concurrency?
I added a "importlib.invalidate_caches()" call in the "for i in range(10):" loop. With this change, a single run doesn't fail. It doesn't fix the test neither: with -R 3:100, the command still fails.
> Some parts of import machinery are platform depended (see issue30873).
I don't think that this issue30873 or issue30626 is platform dependent.
> I don't have idea what is wrong on Windows.
IMHO it's just that my Windows VM is slow. It's a race condition. If I run "./python -m test -m test_concurrency test_import test_import": the command pass, but when again, it fails. It's pure random :-)
|
msg298064 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 13:59 |
I added print(id(package)) in Lib/test/test_import/data/package/__init__.py: when the bug occurs on my Windows VM, I see two print with two different identifiers. Moreover, when the bug occurs, first id(sys.modules['package']) is the first identifier, and then package is replaced with the new 'package' package in sys.modules (new identifier).
I don't understand something: sys.modules isn't supposed to be protected by a lock? I checked if _installed_safely.__enter__() if the _imp is held: it isn't held when _bootstrap._setup() is called, nor when _imp._install_external_importers() is called. But even if I patch these two functions to acquire the lock, Python startup fails on importing a codec module, because the _imp lock is not held in _installed_safely.__enter__(). So maybe sys.modules dict is supposed to be protected by a different lock, maybe indirectly?
I hack _bootstrap._find_and_load_unlocked() to sleep 10 ms if name == 'package'. With this change, test_concurrency() fails immediatly on my Linux box.
* Apply attached importlib_sleep.patch (written for the current master branch)
* Recompile: make regen-importlib && make
* Run test_concurrency() just once: ./python -m test -m test_concurrency -v test_import
* You get the bug
By the way, importlib_sleep.patch changes test_concurrency() to run the test exactly once (it changes to loop to a single iteration, instead of 10).
Is it normal that a sleep makes import failing? Imports are not supposed to be atomic?
--
I tried to similar hack before the commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0:
* add a sleep in importlib
* copy test_concurrency() from master
* Run test_concurrency()
* The test pass
While I'm not sure of my hack, it *seems* like sys.modules was somehow atomic before this commit?
|
msg298066 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 14:25 |
FYI test_concurrency() added by commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0 doesn't really test bpo-30814. I backported the unit test on 1ccbad9c95cf5782a1329eeaecba6e3eb0c37cb8 (commit before b4baacee1adc06edbe30ac7574d17a8cd168e2e0), and the test pass, whereas I expected it to fail before the fix.
|
msg298067 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2017-07-10 14:44 |
No, there's no global lock on the import system these days.
Instead, imports are supposed to be atomic on a per-module basis, but issue 30814 showed that some of the accesses to the shared state management (which *is* still supposed to be protected by the import lock) could get into trouble given concurrent imports from different threads.
However, the new test exercises an interesting sequence of per-module lock acquisition, since each thread effectively runs through this sequence:
- import package # Test case
- import package.submodule # Package init
- import package # Parent package import
That moves back-and-forth between C & Python in a few different places, and hence may have been relying on the GIL to keep the parts executed in C atomic.
So now I'm wondering whether or not this new problem might be due to moving the _imp.acquire_lock() calls into Python code as part of issue 30814, as that creates a window near the start of _find_and_load() where the thread may end up holding neither the GIL *nor* the import lock. By contrast, when the import lock was acquired in C, then by the time the Python code starts running (potentially releasing the GIL), the import lock has already been acquired.
|
msg298068 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 14:47 |
> FYI test_concurrency() added by commit b4baacee1adc06edbe30ac7574d17a8cd168e2e0 doesn't really test bpo-30814. I backported the unit test on 1ccbad9c95cf5782a1329eeaecba6e3eb0c37cb8 (commit before b4baacee1adc06edbe30ac7574d17a8cd168e2e0), and the test pass, whereas I expected it to fail before the fix.
Oops, please ignore this comment, it was a mistake. After clearing my repository (removing __pycache__ directories), the test failed. I forgot to copy the Lib/test/test_import/data/package/ directory.
|
msg298069 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-07-10 14:52 |
test_concurrency is failed before b4baacee1adc06edbe30ac7574d17a8cd168e2e0 on my computer. I think it tests the issue30814 bug even if it doesn't fail on all computers. At the end, this is a race condition, and the failure is probabilistic.
|
msg298070 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 15:09 |
Using attached importlib_sleep.patch, you can *see* the race condition with:
diff --git a/Lib/test/test_import/data/package/__init__.py b/Lib/test/test_import/data/package/__init__.py
index a4f2bc3..f07eb11 100644
--- a/Lib/test/test_import/data/package/__init__.py
+++ b/Lib/test/test_import/data/package/__init__.py
@@ -1,2 +1,3 @@
import package.submodule
+print("package: id=%#x" % id(package))
package.submodule
Example of output:
---
haypo@selma$ ./python -m test -m test_concurrency -v test_import
package: id=0x7fb497400d58
package: id=0x7fb49744d458
test test_import failed -- (...)
---
Lib/test/test_import/data/package/__init__.py is run twice instead of being run once. A lock is missing somewhere on the package module.
|
msg298071 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) * |
Date: 2017-07-10 15:17 |
Victor tested test_concurrency on 3.5, and it is failed on slow Windows VM. Therefore the bug exists on 3.5 and is not related to recent changes.
|
msg298072 - (view) |
Author: Nick Coghlan (ncoghlan) * |
Date: 2017-07-10 15:28 |
I'm about to head to bed so I'll get back to this discussion tomorrow, but in the meantime: Victor, maybe it would make sense to reformulate your reproducer as a monkeypatch for _find_and_load_unlocked() that injects the 10 ms sleep from the test case if it finds _frozen_importlib in sys.modules?
That is, temporarily replacing it with something like:
orig_find_and_load_unlocked = _frozen_importlib._find_and_load_unlocked
def _slow_find_and_load_unlocked(name, import_):
time.sleep(0.010)
return orig_find_and_load_unlocked(name, import_)
That way, we'd hopefully be able to reproduce the failure independently of the slower buildbot.
|
msg298081 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 20:52 |
New changeset 4f9a446f3fb42f800e73cd9414dd1eccb3ca4fa7 by Victor Stinner in branch 'master':
bpo-30891: Fix importlib _find_and_load() race condition (#2646)
https://github.com/python/cpython/commit/4f9a446f3fb42f800e73cd9414dd1eccb3ca4fa7
|
msg298087 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 21:16 |
New changeset fe6e686c27ce3a3ecdf03803cff7e230dee5530d by Victor Stinner in branch '3.6':
bpo-30891: Fix importlib _find_and_load() race condition (#2646) (#2651)
https://github.com/python/cpython/commit/fe6e686c27ce3a3ecdf03803cff7e230dee5530d
|
msg298088 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-10 21:23 |
Backporting the fix ot Python 3.5 doesn't seem trivial, 3.5 lacks many changes.
|
msg298798 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-21 11:00 |
New changeset e72b1359f81d1dd42bd8a5c5cc2b3928b74f8023 by Victor Stinner in branch 'master':
bpo-30891: Fix again importlib _find_and_load() (#2665)
https://github.com/python/cpython/commit/e72b1359f81d1dd42bd8a5c5cc2b3928b74f8023
|
msg298801 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-21 11:28 |
New changeset bb323b261d0f41ad6998d3a16c41b29b3bb8e982 by Victor Stinner in branch '3.6':
bpo-30891: Fix again importlib _find_and_load() (#2665) (#2801)
https://github.com/python/cpython/commit/bb323b261d0f41ad6998d3a16c41b29b3bb8e982
|
msg299184 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-26 02:32 |
With Python 3.5.4rc1 released (3.5.4 will be the last 3.5 release accepting bug fixes), I'm not sure that it's still worth it to try to backport such changes to Python 3.5. The importlib is very important, any regression can be very important.
I'm not sure that backporting the unit test is interesting. If the test fails, we will be unable to fix it.
|
msg299305 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-07-27 10:25 |
Victor:
> I'm not sure that backporting the unit test is interesting. If the test fails, we will be unable to fix it.
Serhiy closed his PR:
https://github.com/python/cpython/pull/2645#event-1181495893
So now I close this issue.
|
msg302106 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2017-09-13 18:49 |
If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport.
|
msg302124 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2017-09-13 22:03 |
Łukasz Langa: "If you're looking at this issue because it still happens to you, you are probably using zipimport. Stop using zipimport."
Hum, do you mean that zipimport also has an issue? Maybe open a new issue?
|
msg314594 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2018-03-28 14:27 |
Yeah, what's wrong with zipimport?
|
msg314615 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2018-03-28 17:27 |
There's a bunch of things wrong with zipimport, just ask Thomas. ;)
|
msg314818 - (view) |
Author: Łukasz Langa (lukasz.langa) * |
Date: 2018-04-02 17:46 |
In the context of this issue: zipimport doesn't properly lock on imports happening in multiple threads at the same time. This causes one thread to receive a partially imported module while the other thread is still performing the import.
zipimport also doesn't properly check __pycache__ directories but *does* look for old-style .pyc files next to the corresponding .py file.
IIRC Brett chooses not to deal with zipimport's warts and has no time to rewrite it in Python. Since the rest of the core team is less qualified, they steer clear, too.
So, don't use zipimport.
|
msg314819 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2018-04-02 17:57 |
That's rather disturbing, and none of that is reflected in the zipimport docs. Is there at least a bpo issue about it? I may be the last person in the world to still recommend putting a zipfile on sys.path -- I used to believe it's faster than directory searches, is that no longer the case?
|
msg314879 - (view) |
Author: Steve Dower (steve.dower) * |
Date: 2018-04-03 13:32 |
I recommend and use it without trouble, but then I rarely defer imports to worker threads.
Barry may have some ideas, but I think this really shouldn't be difficult to do. importlib is pretty extensible, and it's not difficult to read from a zip file.
|
msg314887 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2018-04-03 16:56 |
The other problem is zipimport is written in C and no one wants to deal with that headache if possible. ;) We actually tweaked the importlib.resources API specifically to avoid having to muck with zipimport's C code as it would have required major surgery otherwise. Thomas has also been saying he was planning to upstream the improvements he has made to zipimport internally at Google which also demotivates touching the code knowing that changes are planned.
As for Steve's comment, the tricky bit is zipfile has a bunch of dependencies, so it's not so simple to just freeze that one module and wrap it with importlib to be able to load the stdlib from a zip file (which zipimport supports). Basically zip-reading code needs to be written that can be bootstrapped (C or Python), and then use that in importlib to be able to import the stdlib from a zip file.
I have also been toying with the idea of a zippath module that would take pathlib's API and apply it to zip files acting like a self-contained file system, but I have never bothered as zipfile's API is messy. The "all the time in the world" project would be making all of this come together in a release of Python by building off of each other: new code to read zip files written in freezable Python, use importlib to replace zipimport, and provide a pathlib API to work with zip files (which would probably be partially built for use by importlib).
If I drop my idea of putting a package on PyPI to do lazy importing I could look at tackling this for 3.8.
|
msg314889 - (view) |
Author: Barry A. Warsaw (barry) * |
Date: 2018-04-03 20:14 |
Take a look at https://bugs.python.org/issue25711 which wants to rewrite zipimport from scratch. I took Serhiy's C->Python patch and turned it into a PR, but it would need a lot of love to get it into shape.
(Let's not even talk about https://bugs.python.org/issue32866 which caused me some pain during importlib.resources work.)
Pretty sure Thomas's rumored rewrite is still in C, and it solves problems like support for zip64 and comments, which we've run into also.
zipimport is crucial for single-file executable solutions like pex and our own soon-to-be-open-sourced tool, so it really can't go away. Both of those still unpack wheels/eggs/sdists to a file system cache, not so much for speed but because shared libraries generally can't be imported from anything but a physical path (thanks to dlopen()). I think Google has a patch for glibc that would solve that but there's some hesitation from the glibc maintainers to merge it (and then what about other platforms?).
|
msg314894 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2018-04-03 20:45 |
OK, this sounds much better: zipimport has problems and we should fix them,
presumably by getting Thomas's fixes (sans Google-isms, which may require
some effort). I don't believe Łukasz's recommendation "don't use it" is the
right attitude (though what he puts in Facebook's internal style guide is
his problem :-), and I don't believe that we're better off deprecating it.
|
msg314948 - (view) |
Author: Brett Cannon (brett.cannon) * |
Date: 2018-04-04 18:55 |
And mostly for completeness, I know Thomas has maligned the fact that zipimport has tests for some crazy things in its semantics which made it difficult for him to improve the module.
|
msg314949 - (view) |
Author: Guido van Rossum (gvanrossum) * |
Date: 2018-04-04 19:19 |
If there are overspecified tests we can debate them of course.
On Wed, Apr 4, 2018, 11:55 Brett Cannon <report@bugs.python.org> wrote:
>
> Brett Cannon <brett@python.org> added the comment:
>
> And mostly for completeness, I know Thomas has maligned the fact that
> zipimport has tests for some crazy things in its semantics which made it
> difficult for him to improve the module.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue30891>
> _______________________________________
>
|
|
Date |
User |
Action |
Args |
2022-04-11 14:58:48 | admin | set | github: 75074 |
2018-04-04 19:19:31 | gvanrossum | set | messages:
+ msg314949 |
2018-04-04 18:55:15 | brett.cannon | set | messages:
+ msg314948 |
2018-04-03 20:45:44 | gvanrossum | set | messages:
+ msg314894 |
2018-04-03 20:14:18 | barry | set | messages:
+ msg314889 |
2018-04-03 16:56:19 | brett.cannon | set | messages:
+ msg314887 |
2018-04-03 13:32:33 | steve.dower | set | nosy:
+ barry messages:
+ msg314879
|
2018-04-02 17:57:03 | gvanrossum | set | messages:
+ msg314819 |
2018-04-02 17:46:21 | lukasz.langa | set | messages:
+ msg314818 |
2018-03-28 17:27:34 | brett.cannon | set | nosy:
+ twouters messages:
+ msg314615
|
2018-03-28 14:27:03 | gvanrossum | set | nosy:
+ gvanrossum messages:
+ msg314594
|
2017-09-13 22:03:22 | vstinner | set | messages:
+ msg302124 |
2017-09-13 18:49:44 | lukasz.langa | set | nosy:
+ lukasz.langa messages:
+ msg302106
|
2017-07-27 10:25:47 | vstinner | set | status: open -> closed versions:
- Python 3.5 messages:
+ msg299305
resolution: fixed stage: needs patch -> resolved |
2017-07-26 02:32:34 | vstinner | set | messages:
+ msg299184 |
2017-07-21 11:28:58 | vstinner | set | messages:
+ msg298801 |
2017-07-21 11:05:08 | vstinner | set | pull_requests:
+ pull_request2852 |
2017-07-21 11:00:48 | vstinner | set | messages:
+ msg298798 |
2017-07-11 19:53:47 | vstinner | set | pull_requests:
- pull_request2734 |
2017-07-11 16:27:05 | cooperlees | set | pull_requests:
+ pull_request2734 |
2017-07-11 08:03:34 | vstinner | set | pull_requests:
+ pull_request2730 |
2017-07-10 21:23:57 | vstinner | set | messages:
+ msg298088 |
2017-07-10 21:16:29 | vstinner | set | messages:
+ msg298087 |
2017-07-10 20:56:36 | vstinner | set | pull_requests:
+ pull_request2717 |
2017-07-10 20:52:34 | vstinner | set | messages:
+ msg298081 |
2017-07-10 19:11:26 | brett.cannon | link | issue30893 dependencies |
2017-07-10 16:24:05 | vstinner | set | title: Sometimes, test_concurrency() of test_import fails with AttributeError: module 'package' has no attribute 'submodule' on AMD64 Windows8.1 Refleaks 3.6 -> importlib: _find_and_load() race condition on sys.modules[name] check versions:
+ Python 3.5, Python 3.6 |
2017-07-10 16:12:05 | vstinner | set | pull_requests:
+ pull_request2712 |
2017-07-10 15:28:52 | ncoghlan | set | messages:
+ msg298072 |
2017-07-10 15:17:44 | serhiy.storchaka | set | messages:
+ msg298071 |
2017-07-10 15:13:34 | serhiy.storchaka | set | pull_requests:
+ pull_request2711 |
2017-07-10 15:09:27 | vstinner | set | messages:
+ msg298070 |
2017-07-10 14:52:56 | serhiy.storchaka | set | messages:
+ msg298069 |
2017-07-10 14:47:33 | vstinner | set | messages:
+ msg298068 |
2017-07-10 14:44:11 | ncoghlan | set | messages:
+ msg298067 |
2017-07-10 14:25:32 | vstinner | set | messages:
+ msg298066 |
2017-07-10 13:59:46 | vstinner | set | files:
+ importlib_sleep.patch keywords:
+ patch messages:
+ msg298064
|
2017-07-10 13:09:31 | vstinner | set | messages:
+ msg298063 |
2017-07-10 12:02:13 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg298058
|
2017-07-10 10:40:24 | ncoghlan | set | type: behavior messages:
+ msg298053 stage: needs patch |
2017-07-10 10:05:00 | vstinner | set | messages:
+ msg298049 |
2017-07-10 10:02:10 | vstinner | set | messages:
+ msg298048 |
2017-07-10 09:59:54 | vstinner | set | nosy:
+ brett.cannon, ncoghlan, eric.snow messages:
+ msg298047
|
2017-07-10 09:56:36 | vstinner | set | messages:
+ msg298046 |
2017-07-10 09:44:24 | vstinner | create | |