This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add at least minimal support for thread groups
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Christian.Tismer, Claudiu.Popa, aleax, iritkatriel, kristjan.jonsson, lemburg, neologix, pitrou, r.david.murray, rhettinger, tim.peters, tinchester, tshepang
Priority: normal Keywords: patch

Created on 2014-07-20 06:56 by rhettinger, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
threadgroup.diff rhettinger, 2014-07-20 07:20 Very rough proof-of-concept patch review
parallel_download_application.py rhettinger, 2014-07-20 21:16 Simple example application to parallel downloads
Messages (9)
msg223498 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-07-20 06:56
Currently, threading.Thread requires that group=None pending implementation of a ThreadGroup class such as that described in http://www.javaworld.com/article/2074481

This has been an open todo for very long time, possibly because it may be too big of a task to implement all the features including subgroups.

I think we can implement a small but useful set of features without too much difficultly:

path_explorers = threading.ThreadGroup('path_explorers')
for path in paths:
    threading.Thread(path_explorers, explore, (path,))
for thread in path_explorers: # enumerate unfinished explorers
    print(thread)
path_explorers.start()        # begin parallel search
path_explorers.join()         # wait for group to finish
print("Result:", best_path)
msg223510 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-07-20 14:29
I have a hard time understanding what it would bring in Python's context. Even the Java API doesn't look very useful.
msg223530 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-07-20 19:11
The example looks like something you could use concurrent.futures for and get more features besides?
msg223540 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-07-20 21:45
> The example looks like something you could use concurrent.futures for

The example was minimal to show how it works.  The concept of having groups of related threads is perfectly general.  It is a tool for organizing and reasoning about code more complex than this.  

The main virtue is in the aggregate join operation (wait for all workers doing a given kind of task to finish).  I believe this is reasonably common (I've seen aggregate joins at more than one client). 

The goal is to simplify common patterns of parallel work in phases (each phase must be complete before the next phase starts).

Old way:

    phase1_workers = []
    for data in pool:
        t = threading.Thread(target=phase1, args=(data,))
        t.start()
        phase1_workers.append(t)
    for t in phase1_workers:
        t.join()

    phase2_workers = []
    for data in phase1_pool:
        t = threading.Thread(target=phase2, args=(data,))
        t.start()
        phase2_workers.append(t)
    for t in phase2_workers:
        t.join()

    phase3_workers = []
    for data in phase2_pool:
        t = threading.Thread(target=phase3, args=(data,))
        t.start()
        phase3_workers.append(t)
    for t in phase3_workers:
        t.join()

    print('Done')

New way with cleaner code:

    phase1 = SimpleThreadGroup('phase1')
    phase2 = SimpleThreadGroup('phase2')
    phase3 = SimpleThreadGroup('phase3')

    for data in pool:
        t = threading.Thread(phase1, phase1_task, args=(data,)).start()
    phase1.join()

    for data in phase1_pool:
        t = threading.Thread(phase2, phase2_task, args=(data,)).start()
    phase2.join()

    for data in phase2_pool:
        t = threading.Thread(phase3, phase3_task, args=(data,)).start()
    phase3.join()

    print('Done')

The new code is easier to write, to reason about, and to maintain because the thread group takes care of building the aggregate collection and applying the aggregate join operation to make sure each phase is complete before going on to the next phase (i.e. all sprites moved, all user inputs processed, all monsters generated, all conflicts resolved, all points accumulated, all bonuses applied, ...)

As discussed in http://journals.ecs.soton.ac.uk/java/tutorial/java/threads/threadgroup.html , the feature would be more useful if we had the ability to suspend, resume, or stop collections of threads, but our threading have more limited controls (check name, get identifier, check whether the thread is alive, and most usefully wait for the thread with a join).

For people who write complex multi-threaded code (i.e. my clients), this would offer a nice simplification.  I don't see any reason to leave this feature left as a stub in perpetuity.
msg223545 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-07-20 22:24
> Even the Java API doesn't look very useful.

This isn't a Java-only concept.  It is widely available.  Here's a sample:

c++
"thread_group provides for a collection of threads that are related in some fashion."
http://www.boost.org/doc/libs/1_35_0/doc/html/thread/thread_management.html#thread.thread_management.threadgroup

perl
"Thread::Pool - group of threads for performing similar jobs"
http://search.cpan.org/dist/Thread-Pool/lib/Thread/Pool.pm

haskell
"This module extends Control.Concurrent.Thread with the ability to wait for a group of threads to terminate."
http://hackage.haskell.org/package/threads-0.5.0.2/docs/Control-Concurrent-Thread-Group.html

ruby
"ThreadGroup provides a means of keeping track of a number of threads as a group."
http://www.ruby-doc.org/core-2.1.2/ThreadGroup.html
msg223546 - (view) Author: Tin Tvrtković (tinchester) * Date: 2014-07-20 22:48
For your examples, my first instinct would be to use a thread pool executor. It's a nice high level API and can already do the aggregate join.
msg223547 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-07-20 22:54
This seems indeed like a weaker version of ThreadPoolExecutor. Here's how your example looks with it, not very different and still easy to understand and grasp:


from concurrent.futures import ThreadPoolExecutor
from urllib.request import urlretrieve

with ThreadPoolExecutor(max_workers=3) as executor:
    url = 'http://www.{site}.org/'
    for site in ('perl', 'python', 'jython', 'pypy'):
        future = executor.submit(urlretrieve, url.format(site=site), site)



3 lines without imports and the initialisation of the pool.
msg223562 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-07-21 06:54
> Tin Tvrtković added the comment:
>
> For your examples, my first instinct would be to use a thread pool executor. It's a nice high level API and can already do the aggregate join.

Indeed, the examples posted don't make much sense: thread/process
pools are the way to go.
msg415728 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-03-21 23:58
And now we have TaskGroups in asyncio as well.

I'm closing this as it seems to have been abandoned. Feel free to reopen if this is still needed.
History
Date User Action Args
2022-04-11 14:58:06adminsetgithub: 66212
2022-03-21 23:58:55iritkatrielsetstatus: open -> closed

nosy: + iritkatriel
messages: + msg415728

stage: needs patch -> resolved
2014-07-21 15:46:42tshepangsetnosy: + tshepang
2014-07-21 06:54:20neologixsetmessages: + msg223562
2014-07-20 22:54:28Claudiu.Popasetnosy: + Claudiu.Popa
messages: + msg223547
2014-07-20 22:48:29tinchestersetnosy: + tinchester
messages: + msg223546
2014-07-20 22:24:27rhettingersetmessages: - msg223544
2014-07-20 22:24:16rhettingersetmessages: + msg223545
2014-07-20 22:22:37rhettingersetmessages: + msg223544
2014-07-20 21:52:21rhettingersetnosy: + lemburg, aleax, kristjan.jonsson, Christian.Tismer
2014-07-20 21:45:40rhettingersetmessages: + msg223540
2014-07-20 21:16:37rhettingersetfiles: + parallel_download_application.py
2014-07-20 21:16:16rhettingersetfiles: - parallel_download_application.py
2014-07-20 19:11:11r.david.murraysetnosy: + r.david.murray
messages: + msg223530
2014-07-20 14:29:56pitrousetnosy: + neologix, pitrou
messages: + msg223510
2014-07-20 14:26:40pitrousetnosy: + tim.peters
2014-07-20 07:20:27rhettingersetfiles: + threadgroup.diff
keywords: + patch
2014-07-20 07:19:46rhettingersetfiles: + parallel_download_application.py
2014-07-20 06:56:34rhettingercreate