classification
Title: "dictionary changed size during iteration" error in _ExecutorManagerThread
Type: crash Stage: patch review
Components: asyncio, Installation Versions: Python 3.10, Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: asvetlov, bquinlan, kartiksubbarao, kulikjak, pitrou, whitslack, yselivanov
Priority: normal Keywords: patch

Created on 2021-03-15 08:23 by kulikjak, last changed 2021-06-10 15:48 by whitslack.

Pull Requests
URL Status Linked Edit
PR 24868 open kulikjak, 2021-03-15 08:26
Messages (4)
msg388712 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-03-15 08:23
Recently several of our Python 3.9 builds froze during `make install` with the following trace in logs:

Listing .../components/python/python39/build/prototype/sparc/usr/lib/python3.9/lib2to3/tests/data/fixers/myfixes...
Exception in thread Thread-1:
Traceback (most recent call last):
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration

After this, the build freezes and never ends (most likely waiting for the broken thread).

We see this only in Python 3.9 (3.7 doesn't seem to be affected, and we don't deliver other versions) and only when doing full builds of the entire Userland, meaning that this might be related to big utilization of the build machine? That said, it only happened three or four times, so this might be just a coincidence.

Simple fix seems to be this (PR shortly):

--- Python-3.9.1/Lib/concurrent/futures/process.py
+++ Python-3.9.1/Lib/concurrent/futures/process.py
@@ -373,7 +373,7 @@ class _ExecutorManagerThread(threading.T
         assert not self.thread_wakeup._closed
         wakeup_reader = self.thread_wakeup._reader
         readers = [result_reader, wakeup_reader]
-        worker_sentinels = [p.sentinel for p in self.processes.values()]
+        worker_sentinels = [p.sentinel for p in self.processes.copy().values()]
         ready = mp.connection.wait(readers + worker_sentinels)
 
         cause = None


This is on Oracle Solaris and on both SPARC and Intel machines.
msg389755 - (view) Author: Kartik Subbarao (kartiksubbarao) Date: 2021-03-29 21:16
I'm seeing the same error with Python 3.9.2 on Fedora 33, with a script that uses ProcessPoolExecutor.
msg391120 - (view) Author: Jakub Kulik (kulikjak) * Date: 2021-04-15 08:25
I investigated a little bit more and found out that this happens when `ProcessPoolExecutor::_adjust_process_count()` adds a new process during the iteration.

With the following change, I can reproduce this reliably every time:

--- Python-3.9.1/Lib/concurrent/futures/process.py
+++ Python-3.9.1/Lib/concurrent/futures/process.py
@@ -373,7 +373,14 @@ class _ExecutorManagerThread(threading.T
         assert not self.thread_wakeup._closed
         wakeup_reader = self.thread_wakeup._reader
         readers = [result_reader, wakeup_reader]
-        worker_sentinels = [p.sentinel for p in self.processes.values()]
+        worker_sentinels = []
+        for p in self.processes.values():
+            time.sleep(1)
+            worker_sentinels.append(p.sentinel)
         ready = mp.connection.wait(readers + worker_sentinels)
 
         cause = None

Since `wait_result_broken_or_wakeup()` is called periodically, and there is no issue if processes added during the iteration are omitted (if they were added just after that, they would be omitted anyway), the attached PR shouldn't break anything.
msg395544 - (view) Author: Matt Whitlock (whitslack) Date: 2021-06-10 15:48
Observed this same failure mode on a Raspberry Pi 1 while running 'make install' on Python 3.9.5 with 9 concurrent workers.

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File "/var/tmp/portage/dev-lang/python-3.9.5_p2/image/usr/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration
History
Date User Action Args
2021-06-10 15:48:23whitslacksetnosy: + whitslack
messages: + msg395544
2021-04-15 08:25:30kulikjaksetmessages: + msg391120
2021-03-29 21:16:44kartiksubbaraosetnosy: + kartiksubbarao
messages: + msg389755
2021-03-15 18:52:27xtreaksetnosy: + bquinlan, pitrou
2021-03-15 08:26:34kulikjaksetkeywords: + patch
stage: patch review
pull_requests: + pull_request23630
2021-03-15 08:23:36kulikjakcreate