Issue43546
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2021-03-18 18:58 by anentropic, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Messages (7) | |||
---|---|---|---|
msg389034 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-18 18:58 | |
We have a Django 2.2.19 project on Python 3.9.2 on Debian (slim-buster) in Docker. A bizarre problem started happening to us this week. First I'll show the symptom, we started getting the following error: ... File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module> from django.db.migrations.optimizer import MigrationOptimizer File "<frozen importlib._bootstrap>", line 1004, in _find_and_load File "<frozen importlib._bootstrap>", line 158, in __enter__ File "<frozen importlib._bootstrap>", line 110, in acquire KeyError: 140426340123264 If I look at the source for _bootstrap.py, that error should be impossible: https://github.com/python/cpython/blob/v3.9.2/Lib/importlib/_bootstrap.py#L110 At the top of the acquire method it does: tid = _thread.get_ident() _blocking_on[tid] = self and then on line 110 where we get the KeyError: del _blocking_on[tid] both `tid` and `_blocking_on` are local vars and none of the other lines in the method touch them So how do we get a KeyError? I can only think that something mutates the underlying value of `tid`, but it's supposed to be an int so that's very weird. I started with the symptom because our context for this is complicated to explain. I did find a fix that prevents the error but I do not understand the link between cause and effect. Our context: - we have a large unit test suite for the project which we run in Jenkins - we split the tests across several Jenkins nodes to run in parallel in isolated docker environments - we use some bash to like this to split the test cases: find project/ -iname "test*.py" -print0 | \ xargs --null grep -E '(def test)|(def step_)' -l | \ split -n "r/$NODE_ID/$NODES" | \ xargs ci/bin/run-tests - ci/bin/run-tests is just a wrapper which calls Django's manage.py test command so it receives a list of filenames like "project/metrics/tests/test_client.py" as args - using "nose" test runner via django-nose FWIW We currently split tests across 3 nodes, and it was always node 2 which would fail. I found that commenting out a test case in any of the files being passed to node 2 would prevent the error from occurring. Note that in this case we are still passing *exactly the same filenames* as cli args to the test runner. Splitting the tests across 4 nodes instead of 3 also seems to prevent the error. So it seems like, in some way I don't understand, we just have too many test cases. Perhaps nose is doing something wrong or inefficient when given lots of filenames. But I'm reporting here because the error we get from importlib._bootstrap looks like it should be impossible. |
|||
msg389035 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-18 19:09 | |
FWIW... django-nose==1.4.6 nose==1.3.7 The only use of threading I can see in Nose is here: https://github.com/nose-devs/nose/blob/release_1.3.7/nose/plugins/logcapture.py But we are passing --nologcapture flag and nosetests echoes that flag back to us at the top of the test run, so I believe the feature is disabled |
|||
msg389036 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-18 19:14 | |
https://docs.python.org/3/library/_thread.html#_thread.get_ident > Thread identifiers may be recycled when a thread exits and another thread is created. I still don't see how we get that KeyError but it feels like it might be something to do with this |
|||
msg389037 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-18 19:32 | |
It turns out splitting the tests across 2 nodes rather than 4 also prevents the error. So it's not a case of "too many" tests. So then I think maybe it needs to be a specific number to trigger the error? And yes, if I duplicate one of our test cases (I mean test method in a TestCase class) so that there is one more than before... that also prevents the error! |
|||
msg389038 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2021-03-18 19:43 | |
It looks like method _ModuleLock.acquire() was re-entered in the same thread. * Enter acquire() first time, set _blocking_on[tid]. * Trigger some callback (profiler, debugger, tracemalloc) which uses import and calls acquire() again from the same thread. * Enter acquire() second time, set _blocking_on[tid]. * Delete _blocking_on[tid] and leave the second acquire(). * Try to leave the first acquire() and delete _blocking_on[tid] again. |
|||
msg389067 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-19 10:43 | |
Thank you for your explanation. I am baffled why this has never happened to us before, and why a specific number of test cases should trigger it (comment one out or add an extra one, it goes away). I've pasted our full stack trace below. It always fails in this place. My original thought when getting this error was to check for code that executes at import time. The module being imported when it breaks seems totally innocuous though: https://github.com/django/django/blob/2.2.19/django/db/migrations/optimizer.py There are two instances of: File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked return exec_and_call_hooks(module_name, wrapped, args, kwargs) ...in the traceback, so https://github.com/DataDog/dd-trace-py starts to look suspicious. Traceback (most recent call last): File "/work/manage.py", line 26, in <module> execute_from_command_line(sys.argv) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line utility.execute() File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 375, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv super().run_from_argv(argv) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 323, in run_from_argv self.execute(*args, **cmd_options) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/base.py", line 364, in execute output = self.handle(*args, **options) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/test.py", line 53, in handle failures = test_runner.run_tests(test_labels) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 308, in run_tests result = self.run_suite(nose_argv) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 244, in run_suite nose.core.TestProgram(argv=nose_argv, exit=False, File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 118, in __init__ unittest.TestProgram.__init__( File "/root/.pyenv/versions/3.9.2/lib/python3.9/unittest/main.py", line 101, in __init__ self.runTests() File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 207, in runTests result = self.testRunner.run(self.test) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/core.py", line 50, in run wrapper = self.config.plugins.prepareTest(test) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 99, in __call__ return self.call(*arg, **kw) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/nose/plugins/manager.py", line 167, in simple result = meth(*arg, **kw) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/plugin.py", line 82, in prepareTest self.old_names = self.runner.setup_databases() File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django_nose/runner.py", line 495, in setup_databases return super(NoseTestSuiteRunner, self).setup_databases() File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/runner.py", line 552, in setup_databases return _setup_databases( File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/test/utils.py", line 170, in setup_databases connection.creation.create_test_db( File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/backends/base/creation.py", line 67, in create_test_db call_command( File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 110, in call_command command = load_command_class(app_name, command_name) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/__init__.py", line 36, in load_command_class module = import_module('%s.management.commands.%s' % (app_name, name)) File "/root/.pyenv/versions/3.9.2/lib/python3.9/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1030, in _gcd_import File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked return exec_and_call_hooks(module_name, wrapped, args, kwargs) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks return wrapped(*args, **kwargs) File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 680, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 790, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 14, in <module> from django.db.migrations.autodetector import MigrationAutodetector File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 215, in wrapped_find_and_load_unlocked return exec_and_call_hooks(module_name, wrapped, args, kwargs) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/ddtrace/internal/import_hooks.py", line 171, in exec_and_call_hooks return wrapped(*args, **kwargs) File "/root/.pyenv/versions/3.9.2/lib/python3.9/site-packages/django/db/migrations/autodetector.py", line 10, in <module> from django.db.migrations.optimizer import MigrationOptimizer File "<frozen importlib._bootstrap>", line 1004, in _find_and_load File "<frozen importlib._bootstrap>", line 158, in __enter__ File "<frozen importlib._bootstrap>", line 110, in acquire KeyError: 140581625389696 |
|||
msg389085 - (view) | Author: Anentropic (anentropic) * | Date: 2021-03-19 13:21 | |
Upgrading ddtrace lib to latest version did not help. Disabling ddtrace patching of django module does make the error go away. Thanks again for your help, I will move my bug report over to ddtrace. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:43 | admin | set | github: 87712 |
2021-09-11 10:54:03 | iritkatriel | set | status: open -> closed resolution: third party stage: resolved |
2021-03-19 13:21:51 | anentropic | set | messages: + msg389085 |
2021-03-19 10:43:15 | anentropic | set | messages: + msg389067 |
2021-03-18 20:18:41 | vstinner | set | nosy:
- vstinner |
2021-03-18 19:43:00 | serhiy.storchaka | set | nosy:
+ pitrou, eric.snow, serhiy.storchaka, vstinner, brett.cannon, ncoghlan messages: + msg389038 |
2021-03-18 19:32:54 | anentropic | set | messages: + msg389037 |
2021-03-18 19:14:07 | anentropic | set | messages: + msg389036 |
2021-03-18 19:09:34 | anentropic | set | messages: + msg389035 |
2021-03-18 18:58:26 | anentropic | create |