Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document multiprocessing.pool.ThreadPool #61342

Closed
ncoghlan opened this issue Feb 6, 2013 · 16 comments
Closed

Document multiprocessing.pool.ThreadPool #61342

ncoghlan opened this issue Feb 6, 2013 · 16 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir easy type-feature A feature request or enhancement

Comments

@ncoghlan
Copy link
Contributor

ncoghlan commented Feb 6, 2013

BPO 17140
Nosy @ncoghlan, @ned-deily, @applio, @pablogsal, @miss-islington, @godlygeek
PRs
  • bpo-17140: Document multiprocessing's ThreadPool #23812
  • [3.9] bpo-17140: Document multiprocessing's ThreadPool (GH-23812) #23834
  • [3.8] bpo-17140: Document multiprocessing's ThreadPool (GH-23812) #23835
  • [3.7] bpo-17140: Document multiprocessing's ThreadPool (GH-23812) #23836
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-12-18.18:42:54.028>
    created_at = <Date 2013-02-06.02:08:05.572>
    labels = ['easy', '3.7', '3.8', '3.9', '3.10', 'type-feature', 'docs']
    title = 'Document multiprocessing.pool.ThreadPool'
    updated_at = <Date 2020-12-19.01:00:59.562>
    user = 'https://github.com/ncoghlan'

    bugs.python.org fields:

    activity = <Date 2020-12-19.01:00:59.562>
    actor = 'Tilka'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2020-12-18.18:42:54.028>
    closer = 'ned.deily'
    components = ['Documentation']
    creation = <Date 2013-02-06.02:08:05.572>
    creator = 'ncoghlan'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 17140
    keywords = ['patch', 'easy']
    message_count = 16.0
    messages = ['181495', '189742', '189744', '189745', '189746', '189751', '189755', '189759', '189798', '189800', '225513', '383297', '383298', '383315', '383316', '383317']
    nosy_count = 10.0
    nosy_names = ['ncoghlan', 'ned.deily', 'brandon-rhodes', 'docs@python', 'sbt', 'srodriguez', 'davin', 'pablogsal', 'miss-islington', 'godlygeek']
    pr_nums = ['23812', '23834', '23835', '23836']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17140'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10']

    @ncoghlan
    Copy link
    Contributor Author

    ncoghlan commented Feb 6, 2013

    The multiprocessing module currently provides the "multiprocessing.dummy.ThreadPool" API that exposes the same API as the public multiprocessing.Pool, but is implemented with threads rather than processes. (This is sort of documented - it's existence is implied by the documentation of multiprocessing.dummy, but it doesn't spell out "hey, stdlib ThreadPool implementation!".

    Given that this feature is likely useful to many people for parallelising IO bound tasks without migrating to the concurrent.futures API (or where that API doesn't quite fit the use case), it makes sense to make it a more clearly documented feature under a less surprising name.

    I haven't looked at the implementation, so I'm not sure how easy it will be to migrate it to a different module, but threading seems like a logical choice given the multiprocessing.ThreadPool vs threading.ThreadPool parallel.

    (Honestly, I'd be happier if we moved queue.Queue to threading as well. Having a threading specific data type as a top level module in its own right just makes it harder for people to find for no real reason other than a historical accident)

    Alternatively, we could add a "concurrent.pool" module which was little more than:

    from multiprocessing import Pool as ProcessPool
    from multiprocessing.dummy import ThreadPool

    @ncoghlan ncoghlan added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Feb 6, 2013
    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 21, 2013

    Given that the change could only be made to 3.4, and we already have concurrent.futures.ThreadPoolExecutor, I am not sure there is much point to such a change now.

    @ncoghlan
    Copy link
    Contributor Author

    Thread Pools can be handy when you want to do explicit message passing, rather than the call-and-response model favoured by the futures module.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 21, 2013

    I don't understand what you mean by "explicit message passing" and
    "call-and-response model".

    @ncoghlan
    Copy link
    Contributor Author

    Future are explicitly about kicking off a concurrent call and waiting for a reply. They're great for master/slave and client/server models, but not particularly good for actors and other forms of peer-to-peer message passing.

    For the latter, explicit pools and message queues are still the way to go, and that's why I think a concurrent.pool module may still be useful as a more obvious entry point for the thread pool implementation.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 21, 2013

    As far as I can see they are mostly equivalent. For instance, ApplyResult (the type returned by Pool.apply_async()) is virtually the same as a Future.

    When you say "explicit message passing", do you mean creating a queue and making the worker tasks put results on that queue? Why can't you do the same with ThreadPoolExecutor?

    @ncoghlan
    Copy link
    Contributor Author

    No, I mean implementing communicating sequential processes with independent state machines passing messages to each other. There aren't necessarily any fixed request/reply pairs. Each actor has a "mailbox", which is a queue that you dump its messages into. If you want a reply, you'll include some kind of addressing info to get the answer back rather than receiving it back on the channel you used to send the message.

    For threads, the addressing info can just be a queue.Queue reference for your own mailbox, for multiple processes it can either be multiprocessing queue, or any other form of IPC.

    It's a very different architecture from that assumed by futures, so you need to drop down to the pool layer rather than using the executor model.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 21, 2013

    It's a very different architecture from that assumed by futures,
    so you need to drop down to the pool layer rather than using the
    > executor model.

    AIUI an ThreadPoolExecutor object (which must be explicitly created)
    represents a thread/process pool, and it is used to send tasks to the
    workers in the pool. And if you want to ignore the future object
    returned by submit(), then you can. How is that any different from a
    ThreadPool object?

    And if you are impementing actors on top of a thread pool then isn't
    there a limit on the number "active" actors there can be at any one
    time, potentially creating deadlocks because all workers are waiting for
    messages from an actor which cannot run yet. (I am probably
    misunderstanding what you mean.)

    To me, the obvious way to implement actors would be to create one
    thread/process for each actor. In Python 3.4 one could use the tulip
    equivalents instead for better scalability.

    @ncoghlan
    Copy link
    Contributor Author

    Actors are just as vulnerable to the "new threads/processes are expensive" issue as anything else, and by using a dynamic pool appropriately you can amortise those costs across multiple instances.

    The point is to expose a less opinionated threading model in a more readily accessible way. Executors and futures are *very* opinionated about the communication channels you're expected to use (the ones the executor provides), while pools are just a resource management tool.

    @sbt
    Copy link
    Mannequin

    sbt mannequin commented May 22, 2013

    I understand that a thread pool (in the general sense) might be used to amortise the cost. But I think you would probably have to write this from scratch rather than use the ThreadPool API.

    The ThreadPool API does not really expose anything that the ThreadPoolExceutor API does not -- the differences are just a matter of taste.

    @ncoghlan
    Copy link
    Contributor Author

    After a question from Brandon Rhodes, I noticed that ThreadPool is actually listed in multiprocess.pool.__all__.

    So rather than doing anything more dramatic, we should just document the existing multiprocessing feature.

    As Richard says, the concurrent.futures Executor already provides a general purpose thread and process pooling model, and when that isn't appropriate, something like asyncio or gevent may actually be a better fit anyway.

    @ncoghlan ncoghlan added the easy label Aug 19, 2014
    @ncoghlan ncoghlan changed the title Provide a more obvious public ThreadPool API Document multiprocessing.pool.ThreadPool Aug 19, 2014
    @pitrou pitrou added docs Documentation in the Doc dir 3.7 (EOL) end of life and removed stdlib Python modules in the Lib dir labels Jul 22, 2017
    @pablogsal
    Copy link
    Member

    New changeset 84ebcf2 by Matt Wozniski in branch 'master':
    bpo-17140: Document multiprocessing's ThreadPool (GH-23812)
    84ebcf2

    @miss-islington
    Copy link
    Contributor

    New changeset 1461992 by Miss Islington (bot) in branch '3.9':
    bpo-17140: Document multiprocessing's ThreadPool (GH-23812)
    1461992

    @miss-islington
    Copy link
    Contributor

    New changeset d21d29a by Miss Islington (bot) in branch '3.8':
    [3.8] bpo-17140: Document multiprocessing's ThreadPool (GH-23812) (GH-23835)
    d21d29a

    @ned-deily
    Copy link
    Member

    New changeset 00278d4 by Miss Islington (bot) in branch '3.7':
    bpo-17140: Document multiprocessing's ThreadPool (GH-23812) (GH-23836)
    00278d4

    @ned-deily
    Copy link
    Member

    Thanks, Matt, for the documentation PR.

    @ned-deily ned-deily added 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes labels Dec 18, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes 3.10 only security fixes docs Documentation in the Doc dir easy type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants