Title: Compileall script: add option to use multiple cores
msg171744 - (view) Author: Daniel Holth (dholth) * Date: 2012-10-01 20:46
compileall would benefit approximately linearly from additional CPU cores.  There should be an option.

The noisy output would have to change. Right now it prints "compiling" and then "done" synchronously with doing the actual work.
msg171758 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-10-01 23:16
This should probably use concurrent.futures instead of multiprocessing directly, but yes it would be useful.

Then again, the whole module should probably be rewritten to use importlib as well.
msg205805 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2013-12-10 12:30

Here's a draft patch. It adds a new *processes* parameter to *compile_dir* and a new command line parameter as well.
msg213200 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-12 05:50
Patch looks good.  Some comments on Rietveld.
msg213209 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-12 07:26
Thank you for the review, Éric! Here's the updated patch.
msg213298 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-12 21:05
FTR, py_compile and compileall use importlib in 3.4.
msg213301 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-12 21:50
This looks ready to me.

One thing: “make -j0” is the spelling for “run using all available cores”, whereas “compileall -j0” will use one process.  I don’t know if this should be documented, changed or ignored.
msg213303 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-03-12 21:52
I vote for changed so that -j0 uses all available cores as os.cpu_count() states.
msg213304 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-12 21:53
I agree. I'll modify the patch.
msg213307 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2014-03-12 22:11
+        if args.processes <= 0:
Is that correct?  For make, I think I’ve always seen “-j0”, not negative values.

Could you add a test for -j0? (i.e. check that “compileall -j0” calls the function with “processes=os.cpu_count()”)
msg213308 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-12 22:12
regrtest does that, checking for j <=0.
msg213317 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-12 22:42
Here's a test for j0 == os.cpu_count.
msg213340 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 00:28
Importing ProcessExecutor at the top-level means compileall will crash on systems which don't have multiprocessing support.
msg213417 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-13 16:53
Here's a new patch which addresses Éric's last comments.
Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?
msg213419 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 16:59
> Here's a new patch which addresses Éric's last comments.
> Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?

Neither do I, but you will probably get an ImportError of some sort.
msg213422 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-13 17:48
Here's a new version which catches ImportError for concurrent.futures and raises ValueError in `compile_dir` if `processes` was specified and concurrent.futures is unavailable. The only issue is that I don't know if this should be a ValueError or not. For instance, zipfile uses RuntimeError if `lzma` is unavailable.
msg214450 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-03-22 08:31
What can I do to move this forward? I believe all concerns have been addressed and it seems ready to me.
msg217118 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-24 06:20
Added a new version of the patch which incorporates suggestions made by Jim. Thanks for the review!
msg217173 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-25 21:43
ProcessPoolExecutor already defaults to using cpu_count if max_workers is None.  Consistency with that might be useful too.  (and a default of 1 to mean nothing in parallel is sensible...)
msg217261 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-27 13:00
Added a new patch with improvements suggested by Jim. Thanks!

I removed the handling of processes=1, because it can still be useful: having a background worker which processes the files received from _walk_dir. Also, it checks that compile_dir receives a positive *processes* value, otherwise it raises a ValueError. As a side note, I just found that ProcessPoolExecutor / ThreadPoolExecutor don't verify the value of processes, leading to certain types of errors (see issue21362 for more details).
Jim, the default for processes is still None, meaning "do not use multiple process", because the purpose of ProcessPoolExecutor makes it easy for it to use processes=None=os.cpu_count(). Here we want the user to be explicit about wanting multiple processes or not.
msg217264 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-27 14:06
Add new patch with fixes proposed by Berker Peksag. Thanks for the review. Hopefully, this is the last iteration of this patch.
msg217399 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2014-04-28 19:20
Trying to put bounds on the disagreements.  Does anyone disagree with any of the following:

(1)  compileall currently runs single-threaded in a single process.

(2)  This enhancement intends to allow parallelization by process.

(3)  Users MAY need to express whether they (require/forbid/are expressly apathetic concerning) paralellization.

(3A)  There is some doubt that this even needs to be user-controlled.

(3B)  If it is user-controlled, the patch proposes adding a "processes" parameter to do this.

(3C)  There have been suggestions of other names (notably "workers"), but *if* it is user-controlled, the idea of a new parameter is not controversial.

(4)  Users MAY need to control the degree of parallelization.

(4A)  If so, setting the value of the new parameter to a positive integer > 1 is an acceptable solution.

(4B)  There is not yet consensus on how to represent "Use multi-processing, with the default degree for this system.", "Do NOT use multiprocessing.", or "I don't care."

(4C)  Suggested values have included 1, 0, -1, any negative number, None, and specific strings.  The precise mapping between some of these and the three cases of 4B is not agreed.

(5)  If multiprocessing is explicitly requested, what should happen when it is not available?

(5A)  Fall back to the current way, without multi-processing.

(5B)  Fall back to the current way, without multi-processing, but issue a Warning.

(5C)  Raise an Exception.  (ValueError, ImportError, NotImplemented?)

(6)  Portions of the documentation unrelated to this should be fixed.  But ideally, that would be done separately, and it will NOT be a pre-requisite to this patch.


Another potential value set

None (the default) ==> let the system parallelize as best it can -- as it does in multiprocessing.  If the system picks "not in parallel at all", that is also OK, and no warning is raised.

0 ==> Do not parallelize.

positive integers ==> Use that many processes.

negative ==> ValueError

Would these uses of 0 and negative be too surprising for someone?
msg217586 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-04-30 09:16
Updated patch according to the python-dev thread:

- processes renamed to workers
- `workers` defaults to 1
- When `workers` is equal to 0, then `os.cpu_count` will be used
- When `workers` > 1, multiple processes will be used
- When `workers` == 1, run normally (no multiple processes)
- Negative values raises a ValueError
- Will raise NotImplementedError if multiprocessing can't be used
(when `workers` equals to 0 or > 1)
msg226684 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-09-10 06:58
If there is nothing left to do for this patch, can it be committed?
msg226822 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-12 14:40
New changeset 9efefcab817e by Brett Cannon in branch 'default':
Issue #16104: Allow compileall to do parallel bytecode compilation.
msg226823 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-09-12 14:40
Thanks for the patch, Claudiu!
msg226824 - (view) Author: PCManticore (Claudiu.Popa) * (Python triager) Date: 2014-09-12 14:41
Thank you for committing it. :-)
msg362786 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2020-02-27 08:02
This caused a regression in behavior.  compileall.compile_dir()'s ddir= parameter no longer does the right thing for any subdirectories.
