Author kulikjak
Recipients asvetlov, kulikjak, yselivanov
Date 2021-03-15.08:23:34
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1615796616.26.0.51182574064.issue43498@roundup.psfhosted.org>
In-reply-to
Content
Recently several of our Python 3.9 builds froze during `make install` with the following trace in logs:

Listing .../components/python/python39/build/prototype/sparc/usr/lib/python3.9/lib2to3/tests/data/fixers/myfixes...
Exception in thread Thread-1:
Traceback (most recent call last):
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 317, in run
    result_item, is_broken, cause = self.wait_result_broken_or_wakeup()
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in wait_result_broken_or_wakeup
    worker_sentinels = [p.sentinel for p in self.processes.values()]
  File ".../components/python/python39/build/prototype/sparc/usr/lib/python3.9/concurrent/futures/process.py", line 376, in <listcomp>
    worker_sentinels = [p.sentinel for p in self.processes.values()]
RuntimeError: dictionary changed size during iteration

After this, the build freezes and never ends (most likely waiting for the broken thread).

We see this only in Python 3.9 (3.7 doesn't seem to be affected, and we don't deliver other versions) and only when doing full builds of the entire Userland, meaning that this might be related to big utilization of the build machine? That said, it only happened three or four times, so this might be just a coincidence.

Simple fix seems to be this (PR shortly):

--- Python-3.9.1/Lib/concurrent/futures/process.py
+++ Python-3.9.1/Lib/concurrent/futures/process.py
@@ -373,7 +373,7 @@ class _ExecutorManagerThread(threading.T
         assert not self.thread_wakeup._closed
         wakeup_reader = self.thread_wakeup._reader
         readers = [result_reader, wakeup_reader]
-        worker_sentinels = [p.sentinel for p in self.processes.values()]
+        worker_sentinels = [p.sentinel for p in self.processes.copy().values()]
         ready = mp.connection.wait(readers + worker_sentinels)
 
         cause = None


This is on Oracle Solaris and on both SPARC and Intel machines.
History
Date User Action Args
2021-03-15 08:23:36kulikjaksetrecipients: + kulikjak, asvetlov, yselivanov
2021-03-15 08:23:36kulikjaksetmessageid: <1615796616.26.0.51182574064.issue43498@roundup.psfhosted.org>
2021-03-15 08:23:36kulikjaklinkissue43498 messages
2021-03-15 08:23:34kulikjakcreate