classification
Title: test_threaded_import: KeyError ignored in _get_module_lock..cb
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, ncoghlan, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2017-07-28 15:03 by vstinner, last changed 2017-10-28 04:46 by ncoghlan. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 3029 closed vstinner, 2017-08-08 23:42
PR 3033 merged serhiy.storchaka, 2017-08-09 06:42
PR 3038 merged serhiy.storchaka, 2017-08-09 11:37
Messages (13)
msg299393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-07-28 15:03
See bpo-bpo-30891 and bpo-30876 for recent changes in importlib.

http://buildbot.python.org/all/builders/AMD64%20Windows8.1%20Refleaks%203.x/builds/57/steps/test/logs/stdio

1:33:30 [312/406/2] test_threaded_import passed (30 sec) -- running: test_decimal (826 sec), test_set (84 sec), test_mmap (1151 sec)
beginning 6 repetitions
123456
...Exception ignored in: <function _get_module_lock.<locals>.cb at 0x000000813CA18EB8>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 176, in cb
KeyError: ('random',)
...
msg299951 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-08 22:59
Oh, it's easy to reproduce the bug on the master branch. Example on Linux:

haypo@selma$ ./python -m test test_threaded_import
Run tests sequentially
0:00:00 load avg: 0.16 [1/1] test_threaded_import
Exception ignored in: <function _get_module_lock.<locals>.cb at 0x7f53201e2f70>
Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 176, in cb
KeyError: ('random',)
1 test OK.

Total duration: 2 sec
Tests result: SUCCESS
msg299952 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-08 23:40
The problem is the weakref callback of _module_locks. It seems like the callback is called after a second thread replaced _module_locks[name]

* Thread 1: _get_module_lock('random') sets _module_locks['random'] and registers a weakref callback on the newly created lock

* Thread 1: the lock is destroyed, the weakref callback will be called as soon as possible

* Thread 2: _get_module_lock('random') sees that the weakref is dead and so replaces _module_locks['random'] with a new lock, and registers a new weakref callback on the second lock

* Thread 1: Call the weakref callback removing _module_locks['random'] -- BUG: the callback is called too late, after Thread 2 already sets a new lock

* Thread 2: The lock is destroyed, the second weakref callback will be called as soon as possible

* Thread 2: The second callback is called and fails with KeyError since _module_locks['random'] was already called in the Thread 1 in the meanwhile
msg299953 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-08 23:43
I wrote https://github.com/python/cpython/pull/3029 which seems to fix the issue.

I don't know importlib well enough to understand why we need a weak reference to a lock.

My PR adds a second lock per module lock (!) to be able to wait until _module_locks[name] is deleted when a thread detects that _module_locks[name] exists and the associated lock was destroyed.
msg299971 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-09 06:45
I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary.
msg299980 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-09 09:23
Serhiy: "I think the solution can be simpler. PR 3033 uses the global import lock for guarding modification of the _module_locks dictionary."

I tried exactly that, but it wasn't enough. But your PR is correct because it also checks the current value of _module_locks[name] using the callback parameter.

I disliked my own PR, so I'm happy that someone else wrote a simpler fix! (I already abandonned my PR.)
msg299989 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-09 11:29
New changeset 9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a by Serhiy Storchaka in branch 'master':
bpo-31070: Fix a race condition in importlib _get_module_lock(). (#3033)
https://github.com/python/cpython/commit/9b0d1d647e3d2ec9d299e5c9f49b02fbbb810a5a
msg299992 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-09 11:56
New changeset f3b891718e104b6e7018b58bbcd86421a2837fb8 by Serhiy Storchaka in branch '3.6':
[3.6] bpo-31070: Fix a race condition in importlib _get_module_lock(). (GH-3033). (#3038)
https://github.com/python/cpython/commit/f3b891718e104b6e7018b58bbcd86421a2837fb8
msg299993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-08-09 11:57
Thank you for review Victor and Nick.
msg299999 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-09 13:12
I'm curious to know if this issue is a regression caused by recent changes: bpo-30891 and bpo-30876. But well, it doesn't matter as soon as it's fixed :-)
msg300007 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-08-09 13:46
I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well.
msg300009 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-09 14:01
>  I don't think the changes created the bug, but I do think the new tests to provoke the other race conditions made it easier for the test suite to hit this race as well.

Oh ok, that's a good news!
msg305147 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2017-10-28 04:46
A recent numpy threaded import bug report that appears in 3.6.1 but is absent in 3.6.3: https://github.com/numpy/numpy/issues/9931

So that's additional evidence that this really was a pre-existing race condition that the new tests for the SystemError cases revealed.
History
Date User Action Args
2017-10-28 04:46:37ncoghlansetmessages: + msg305147
2017-08-09 14:01:24vstinnersetmessages: + msg300009
2017-08-09 13:46:14ncoghlansetmessages: + msg300007
2017-08-09 13:12:19vstinnersetmessages: + msg299999
2017-08-09 11:57:47serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg299993

stage: patch review -> resolved
2017-08-09 11:56:16serhiy.storchakasetmessages: + msg299992
2017-08-09 11:37:26serhiy.storchakasetpull_requests: + pull_request3071
2017-08-09 11:29:15serhiy.storchakasetmessages: + msg299989
2017-08-09 09:23:04vstinnersetmessages: + msg299980
2017-08-09 08:47:24vstinnersettitle: test_threaded_import: KeyError ignored in _get_module_lock.<locals>.cb on AMD64 Windows8.1 Refleaks 3.x -> test_threaded_import: KeyError ignored in _get_module_lock.<locals>.cb
2017-08-09 06:45:12serhiy.storchakasetversions: + Python 3.6
messages: + msg299971

components: + Interpreter Core, - Tests
type: behavior
stage: patch review
2017-08-09 06:42:41serhiy.storchakasetpull_requests: + pull_request3068
2017-08-08 23:43:57vstinnersetmessages: + msg299953
2017-08-08 23:42:16vstinnersetpull_requests: + pull_request3064
2017-08-08 23:40:56vstinnersetmessages: + msg299952
2017-08-08 22:59:16vstinnersetnosy: + brett.cannon, ncoghlan
messages: + msg299951
2017-07-28 15:03:41vstinnercreate