This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: doc 17.2.1: basic Pool example is too basic
Type: enhancement Stage: resolved
Components: Documentation Versions: Python 3.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Winterflower, berker.peksag, davin, docs@python, jwuttke, methane
Priority: normal Keywords:

Created on 2017-02-15 22:59 by jwuttke, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
smime.p7s jwuttke, 2017-02-21 09:11
Messages (7)
msg287895 - (view) Author: Joachim (jwuttke) * Date: 2017-02-15 22:59
The »basic example of data parallelism using Pool« is too basic. It demonstrates the syntax, but otherwise makes no sense, and therefore is potentially confusing. It is blatant nonsense to run 5 processes when there are only 3 data to be treated. Let me suggest the following:


from multiprocessing import Pool
import time

def f(x):
    time.sleep(1)
    return x*x

if __name__ == '__main__':
    start_time = time.time()
    with Pool(4) as p:
        print(p.map(f, range(20)))
    print("elapsed wall time: ", time.time()-start_time)

The sleep command makes f representative for a function that takes significant time to execute. Printing the elapsed time shows the user that the 20 calls of f have indeed taken place in parallel.
msg288045 - (view) Author: Camilla Montonen (Winterflower) Date: 2017-02-17 23:18
Would you like to open a PR with a patch on GH? I think the docs could certainly do with another example for Pool.
msg288253 - (view) Author: Davin Potts (davin) * (Python committer) Date: 2017-02-21 01:42
When passing judgement on what is "too basic", the initial example should be so basic as to be immediately digestible by as many people as possible.

Some background:
All too many examples mislead newcomers into believing that the number of processes should (a) match the number of processor cores, or (b) match the number of inputs to be processed.  This example currently attempts to dispel both notions.  In practice, and this depends upon what specific code is to be performed in parallel, it is not uncommon to find that slightly over-scheduling the number of processes versus the number of available cores can achieve superior throughput and performance.  In other cases, slightly under-scheduling may provide a win.  To help subtly encourage the newcomer, this example uses 5 processes as opposed to something which might be mistaken for a common number of cores available on current multi-core processors.  Likewise, the number of distinct inputs to be processed deliberately does not match the number of processes nor a multiple of the number of processes.  This hopefully encourages the newcomer to not feel obligated to only accept inputs of a particular size or multiple.  Granted, optimizing for performance motivates tuning such things but this is the first example / first glance at what functionality is available.

Considering the suggested change:
* range(20) will likely produce more output than can be comfortably accommodated and easily read in the available browser window where most will see this
* the addition of execution time measurement is an interesting choice here given how computationally trivial the f(x) function is, which is perhaps what motivated the introduction of a time.sleep(1) inside that function; a ThreadPool would be more appropriate for a sleepy function such as this

Ultimately these changes complicate the example while potentially undermining its value.  An interesting improvement to this example might be to introduce a computationally taxing function which more clearly demonstrates the benefit of using a process Pool but still achieving the ideal of being immediately digestible and understood by the largest reading audience.  Some of the topics/variations in the proposed change might be better introduced and addressed later in the documentation rather than unnecessarily complicating the first example.
msg288274 - (view) Author: Joachim (jwuttke) * Date: 2017-02-21 07:18
Dear Davin,

since I am new to the Python bug tracker, I have to asked
a stupid question: are you the benevolent dictator of the
Python Standard Library?

Otherwise, how can it be that one and the same person in
one and the same intervention adds new arguments to the
debate, conclude that theses arguments are winning,
declares that the current state »works for me«, and closes
the issue, thereby cutting short any further debate?

I have several other issues with the current Python
documentation, but if defenders of the status quo are
acting that boldly, I would only waste my time to raise
them.

With best regards, Joachim
msg288276 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2017-02-21 08:14
I like current, minimum example to describe API.

No need to make it complex only for checking it's really executed in parallel.

Adding more and more "may be useful for someone" code in the doc make the document long, hard and tedious to read for everyone.
msg288278 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2017-02-21 09:04
I agree with and Davin and Inada. Note that multiprocessing documentation is already too long and we'd like to be careful before adding another example.

If you could create a "multiprocessing cookbook" on wiki.python.org I'm pretty sure we'd be happy to consider adding a link to it in the documentation.
msg288279 - (view) Author: Joachim (jwuttke) * Date: 2017-02-21 09:11
I never proposed to add a second example, but to make the one example more meaningful.

As a minimal solution, could we replace the numbers 3 (input data) and 5 (threads)
by a slightly more plausible choice? Davin explained why numbers should be
incommensurate. So what about 10 data, 3 threads?
History
Date User Action Args
2022-04-11 14:58:43adminsetgithub: 73761
2017-02-21 09:11:39jwuttkesetfiles: + smime.p7s

messages: + msg288279
2017-02-21 09:04:53berker.peksagsetnosy: + berker.peksag

messages: + msg288278
title: your closing of issue29575 -> doc 17.2.1: basic Pool example is too basic
2017-02-21 08:14:14methanesetnosy: + methane
messages: + msg288276
2017-02-21 07:18:36jwuttkesetmessages: + msg288274
title: doc 17.2.1: basic Pool example is too basic -> your closing of issue29575
2017-02-21 01:42:28davinsetstatus: open -> closed
type: enhancement
messages: + msg288253

resolution: works for me
stage: resolved
2017-02-18 08:26:42rhettingersetnosy: + davin
2017-02-17 23:18:47Winterflowersetnosy: + Winterflower
messages: + msg288045
2017-02-15 22:59:20jwuttkecreate