This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: "dictionary changed size during iteration" error in _ExecutorManagerThread
Type: crash Stage: resolved
Components: asyncio, Installation Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: asvetlov Nosy List: Dennis Sweeney, asvetlov, bquinlan, colesbury, kartiksubbarao, kulikjak, markao, miss-islington, pitrou, thomas-petazzoni, whitslack, yselivanov
Priority: normal Keywords: patch

Created on 2021-03-15 08:23 by kulikjak, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 24868 merged kulikjak, 2021-03-15 08:26
PR 29836 merged miss-islington, 2021-11-29 12:03
PR 29837 merged miss-islington, 2021-11-29 12:03
Messages (13)
msg388712 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-03-15 08:23
Recently several of our Python 3.9 builds froze during `make install` with the following trace in logs:

Listing .../components/python/python39/build/prototype/sparc/usr/lib/python3.9/lib2to3/tests/data/fixers/myfixes...
Exception in thread Thread-1:
Traceback (most recent call last):
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration

After this, the build freezes and never ends (most likely waiting for the broken thread).

We see this only in Python 3.9 (3.7 doesn't seem to be affected, and we don't deliver other versions) and only when doing full builds of the entire Userland, meaning that this might be related to big utilization of the build machine? That said, it only happened three or four times, so this might be just a coincidence.

Simple fix seems to be this (PR shortly):

--- Python-3.9.1/Lib/concurrent/futures/process.py
+++ Python-3.9.1/Lib/concurrent/futures/process.py
@@ -373,7 +373,7 @@ class _ExecutorManagerThread(threading.T
         assert not self.thread_wakeup._closed
         wakeup_reader = self.thread_wakeup._reader
         readers = [result_reader, wakeup_reader]
-        worker_sentinels = [p.sentinel for p in self.processes.values()]
+        worker_sentinels = [p.sentinel for p in self.processes.copy().values()]
         ready = mp.connection.wait(readers + worker_sentinels)
 
         cause = None


This is on Oracle Solaris and on both SPARC and Intel machines.
msg389755 - (view) Author: Kartik Subbarao (kartiksubbarao) Date: 2021-03-29 21:16
I'm seeing the same error with Python 3.9.2 on Fedora 33, with a script that uses ProcessPoolExecutor.
msg391120 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-04-15 08:25
I investigated a little bit more and found out that this happens when `ProcessPoolExecutor::_adjust_process_count()` adds a new process during the iteration.

With the following change, I can reproduce this reliably every time:

--- Python-3.9.1/Lib/concurrent/futures/process.py
+++ Python-3.9.1/Lib/concurrent/futures/process.py
@@ -373,7 +373,14 @@ class _ExecutorManagerThread(threading.T
         assert not self.thread_wakeup._closed
         wakeup_reader = self.thread_wakeup._reader
         readers = [result_reader, wakeup_reader]
-        worker_sentinels = [p.sentinel for p in self.processes.values()]
+        worker_sentinels = []
+        for p in self.processes.values():
+            time.sleep(1)
+            worker_sentinels.append(p.sentinel)
         ready = mp.connection.wait(readers + worker_sentinels)
 
         cause = None

Since `wait_result_broken_or_wakeup()` is called periodically, and there is no issue if processes added during the iteration are omitted (if they were added just after that, they would be omitted anyway), the attached PR shouldn't break anything.
msg395544 - (view) Author: Matt Whitlock (whitslack) Date: 2021-06-10 15:48
Observed this same failure mode on a Raspberry Pi 1 while running 'make install' on Python 3.9.5 with 9 concurrent workers.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration
msg398805 - (view) Author: Thomas Petazzoni (thomas-petazzoni) Date: 2021-08-02 20:57
I can confirm we are seeing the same issue when building Python 3.9 in the context of Buildroot. See http://autobuild.buildroot.net/results/ae6/ae6c4ab292589a4e4442dfb0a1286349a9bf4d29/build-end.log for an example build result. This happens since we have added 48-cores (96 threads) build machines to our build farm, which dramatically increased the build parallelism.
msg398806 - (view) Author: Thomas Petazzoni (thomas-petazzoni) Date: 2021-08-02 21:00
For the record: we're seeing this issue ~50 times a day on our build infrastructure.
msg398811 - (view) Author: Dennis Sweeney (Dennis Sweeney) * (Python committer) Date: 2021-08-03 01:30
It was mentioned in bpo-40327 that although copy() makes the situation much better, it doesn't solve the problem entirely, since the memory allocation of the copy() call can release the GIL. I don't know enough to know whether it would be worth it to add locking.
msg398870 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-08-04 09:02
I think that even if copy() doesn't fix it entirely, it's still much better than nothing. I never encountered the issue mentioned in bpo-40327, but I saw this issue several times a week (before applying the proposed patch).
msg407251 - (view) Author: Mark Ao (markao) Date: 2021-11-29 08:32
I'm experiencing the same issue on Python 3.10.0 when I execute the code that uses concurrent.futures.ProcessPoolExecutor.

========
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File "/usr/local/lib/python3.10/concurrent/futures/process.py", line 376, in <listcomp>
PROCESSING DATAFRAME: AKAM
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration
========

I also tried to troubleshoot to find out the part that causes this exception, but the most difficult part is: it does not happen every time I execute my code that uses concurrent.futures.ProcessPoolExecutor. (Really like what Jakub mentioend earlier, it is like a coincidence.)

At the same time, I am also testing if the same thing happens on other versions like Python 3.8.8 (on Rocky Linux 8.5), but we would appreciate it if someone can tell if this is a bug or not? Or even anything we should improve on my own code? (if needed I can share the sample code, but honestly I do not think this is something wrong with my code, since as I mentioned: the exception is not happening every time I execute my code, so I suspect this might be a bug of Python 3.10.0)

(Since Jakub already reported it happens on Python 3.9, so I am not testing on 3.9)

I would appreciate it if there is any update or info that can be shared.

Thank you!
msg407260 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2021-11-29 10:47
Thanks for the report.

Atomic copy (`list(self.processes.values()`) should fix the bug, sure.

I doubt if writing a reliable test for this situation is possible; multithreading is hard.

I think we can accept a patch without a test but with an inline comment that describes why copy is crucial.
msg407268 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2021-11-29 12:03
New changeset 7431448b817d3bf87f71661cf8f3d537807ab2e2 by Jakub Kulík in branch 'main':
bpo-43498: Fix dictionary iteration error in _ExecutorManagerThread (GH-24868)
https://github.com/python/cpython/commit/7431448b817d3bf87f71661cf8f3d537807ab2e2
msg407269 - (view) Author: miss-islington (miss-islington) Date: 2021-11-29 12:24
New changeset 4b11d7118561a12322d3cfa76c5941690b241149 by Miss Islington (bot) in branch '3.10':
bpo-43498: Fix dictionary iteration error in _ExecutorManagerThread (GH-24868)
https://github.com/python/cpython/commit/4b11d7118561a12322d3cfa76c5941690b241149
msg407270 - (view) Author: miss-islington (miss-islington) Date: 2021-11-29 12:28
New changeset 3b9d886567c4fc6279c2198b6711f0590dbf3336 by Miss Islington (bot) in branch '3.9':
bpo-43498: Fix dictionary iteration error in _ExecutorManagerThread (GH-24868)
https://github.com/python/cpython/commit/3b9d886567c4fc6279c2198b6711f0590dbf3336
History
Date User Action Args
2022-04-11 14:59:42adminsetgithub: 87664
2022-01-25 13:51:40iritkatriellinkissue45945 superseder
2021-11-29 13:24:23asvetlovsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-11-29 12:28:54miss-islingtonsetmessages: + msg407270
2021-11-29 12:24:43miss-islingtonsetmessages: + msg407269
2021-11-29 12:03:09miss-islingtonsetpull_requests: + pull_request28069
2021-11-29 12:03:05asvetlovsetmessages: + msg407268
2021-11-29 12:03:05miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request28068
2021-11-29 11:16:06AlexWaygoodsetnosy: + colesbury
2021-11-29 10:47:36asvetlovsetassignee: asvetlov
2021-11-29 10:47:26asvetlovsetmessages: + msg407260
versions: + Python 3.11
2021-11-29 08:32:39markaosetnosy: + markao
messages: + msg407251
2021-08-04 09:02:55kulikjaksetmessages: + msg398870
2021-08-03 01:30:49Dennis Sweeneysetnosy: + Dennis Sweeney
messages: + msg398811
2021-08-02 21:00:24thomas-petazzonisetmessages: + msg398806
2021-08-02 20:57:26thomas-petazzonisetnosy: + thomas-petazzoni
messages: + msg398805
2021-06-10 15:48:23whitslacksetnosy: + whitslack
messages: + msg395544
2021-04-15 08:25:30kulikjaksetmessages: + msg391120
2021-03-29 21:16:44kartiksubbaraosetnosy: + kartiksubbarao
messages: + msg389755
2021-03-15 18:52:27xtreaksetnosy: + bquinlan, pitrou
2021-03-15 08:26:34kulikjaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request23630
2021-03-15 08:23:36kulikjakcreate