This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: "Impossible" KeyError from importlib._bootstrap acquire line 110
Type: crash Stage: resolved
Components: Versions: Python 3.9
process
Status: closed Resolution: third party
Dependencies: Superseder:
Assigned To: Nosy List: anentropic, brett.cannon, eric.snow, ncoghlan, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2021-03-18 18:58 by anentropic, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg389034 - (view) Author: Anentropic (anentropic) * Date: 2021-03-18 18:58
We have a Django 2.2.19 project on Python 3.9.2 on Debian (slim-buster) in Docker.

A bizarre problem started happening to us this week.

First I'll show the symptom, we started getting the following error:

...
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module>
    from django.db.migrations.optimizer import MigrationOptimizer
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load
  File "<frozen importlib._bootstrap>", line 158, in __enter__
  File "<frozen importlib._bootstrap>", line 110, in acquire
KeyError: 140426340123264

If I look at the source for _bootstrap.py, that error should be impossible:
https://github.com/python/cpython/blob/v3.9.2/Lib/importlib/_bootstrap.py#L110

At the top of the acquire method it does:

    tid = _thread.get_ident()
    _blocking_on[tid] = self

and then on line 110 where we get the KeyError:

    del _blocking_on[tid]

both `tid` and `_blocking_on` are local vars and none of the other lines in the method touch them

So how do we get a KeyError?

I can only think that something mutates the underlying value of `tid`, but it's supposed to be an int so that's very weird.

I started with the symptom because our context for this is complicated to explain. I did find a fix that prevents the error but I do not understand the link between cause and effect.

Our context:
- we have a large unit test suite for the project which we run in Jenkins
- we split the tests across several Jenkins nodes to run in parallel in isolated docker environments
- we use some bash to like this to split the test cases:
  find project/ -iname "test*.py" -print0 | \
    xargs --null grep -E '(def test)|(def step_)' -l | \
    split -n "r/$NODE_ID/$NODES" | \
    xargs ci/bin/run-tests
- ci/bin/run-tests is just a wrapper which calls Django's manage.py test command
  so it receives a list of filenames like "project/metrics/tests/test_client.py" as args
- using "nose" test runner via django-nose FWIW

We currently split tests across 3 nodes, and it was always node 2 which would fail.
I found that commenting out a test case in any of the files being passed to node 2 would prevent the error from occurring.
Note that in this case we are still passing *exactly the same filenames* as cli args to the test runner.

Splitting the tests across 4 nodes instead of 3 also seems to prevent the error.
So it seems like, in some way I don't understand, we just have too many test cases.
Perhaps nose is doing something wrong or inefficient when given lots of filenames.

But I'm reporting here because the error we get from importlib._bootstrap looks like it should be impossible.
msg389035 - (view) Author: Anentropic (anentropic) * Date: 2021-03-18 19:09
FWIW...

django-nose==1.4.6
nose==1.3.7

The only use of threading I can see in Nose is here:
https://github.com/nose-devs/nose/blob/release_1.3.7/nose/plugins/logcapture.py

But we are passing --nologcapture flag and nosetests echoes that flag back to us at the top of the test run, so I believe the feature is disabled
msg389036 - (view) Author: Anentropic (anentropic) * Date: 2021-03-18 19:14
https://docs.python.org/3/library/_thread.html#_thread.get_ident

> Thread identifiers may be recycled when a thread exits and another thread is created.

I still don't see how we get that KeyError but it feels like it might be something to do with this
msg389037 - (view) Author: Anentropic (anentropic) * Date: 2021-03-18 19:32
It turns out splitting the tests across 2 nodes rather than 4 also prevents the error.

So it's not a case of "too many" tests.

So then I think maybe it needs to be a specific number to trigger the error?

And yes, if I duplicate one of our test cases (I mean test method in a TestCase class) so that there is one more than before... that also prevents the error!
msg389038 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-03-18 19:43
It looks like method _ModuleLock.acquire() was re-entered in the same thread.

* Enter acquire() first time, set _blocking_on[tid].
* Trigger some callback (profiler, debugger, tracemalloc) which uses import and calls acquire() again from the same thread.
* Enter acquire() second time, set _blocking_on[tid].
* Delete _blocking_on[tid] and leave the second acquire().
* Try to leave the first acquire() and delete _blocking_on[tid] again.
msg389067 - (view) Author: Anentropic (anentropic) * Date: 2021-03-19 10:43
Thank you for your explanation.

I am baffled why this has never happened to us before, and why a specific number of test cases should trigger it (comment one out or add an extra one, it goes away).

I've pasted our full stack trace below. It always fails in this place.

My original thought when getting this error was to check for code that executes at import time. The module being imported when it breaks seems totally innocuous though:
https://github.com/django/django/blob/2.2.19/django/db/migrations/optimizer.py

There are two instances of:

File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
    return exec_and_call_hooks(module_name, wrapped, args, kwargs)

...in the traceback, so https://github.com/DataDog/dd-trace-py starts to look suspicious.


Traceback (most recent call last):
  File "/work/manage.py", line 26, in <module>
    execute_from_command_line(sys.argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
    super().run_from_argv(argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 53, in handle
    failures = test_runner.run_tests(test_labels)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 308, in run_tests
    result = self.run_suite(nose_argv)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 244, in run_suite
    nose.core.TestProgram(argv=nose_argv, exit=False,
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 118, in __init__
    unittest.TestProgram.__init__(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/unittest/main.py", line 101, in __init__
    self.runTests()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 207, in runTests
    result = self.testRunner.run(self.test)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 50, in run
    wrapper = self.config.plugins.prepareTest(test)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 99, in __call__
    return self.call(*arg, **kw)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 167, in simple
    result = meth(*arg, **kw)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/plugin.py", line 82, in prepareTest
    self.old_names = self.runner.setup_databases()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 495, in setup_databases
    return super(NoseTestSuiteRunner, self).setup_databases()
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/runner.py", line 552, in setup_databases
    return _setup_databases(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/utils.py", line 170, in setup_databases
    connection.creation.create_test_db(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/backends/base/creation.py", line 67, in create_test_db
    call_command(
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 110, in call_command
    command = load_command_class(app_name, command_name)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 36, in load_command_class
    module = import_module('%s.management.commands.%s' % (app_name, name))
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
    return exec_and_call_hooks(module_name, wrapped, args, kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
    return wrapped(*args, **kwargs)
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 14, in <module>
    from django.db.migrations.autodetector import MigrationAutodetector
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked
    return exec_and_call_hooks(module_name, wrapped, args, kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks
    return wrapped(*args, **kwargs)
  File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module>
    from django.db.migrations.optimizer import MigrationOptimizer
  File "<frozen importlib._bootstrap>", line 1004, in _find_and_load
  File "<frozen importlib._bootstrap>", line 158, in __enter__
  File "<frozen importlib._bootstrap>", line 110, in acquire
KeyError: 140581625389696
msg389085 - (view) Author: Anentropic (anentropic) * Date: 2021-03-19 13:21
Upgrading ddtrace lib to latest version did not help.

Disabling ddtrace patching of django module does make the error go away.

Thanks again for your help, I will move my bug report over to ddtrace.
History
Date User Action Args
2022-04-11 14:59:43adminsetgithub: 87712
2021-09-11 10:54:03iritkatrielsetstatus: open -> closed
resolution: third party
stage: resolved
2021-03-19 13:21:51anentropicsetmessages: + msg389085
2021-03-19 10:43:15anentropicsetmessages: + msg389067
2021-03-18 20:18:41vstinnersetnosy: - vstinner
2021-03-18 19:43:00serhiy.storchakasetnosy: + pitrou, eric.snow, serhiy.storchaka, vstinner, brett.cannon, ncoghlan
messages: + msg389038
2021-03-18 19:32:54anentropicsetmessages: + msg389037
2021-03-18 19:14:07anentropicsetmessages: + msg389036
2021-03-18 19:09:34anentropicsetmessages: + msg389035
2021-03-18 18:58:26anentropiccreate