classification
Title: Compileall script: add option to use multiple cores
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: brett.cannon Nosy List: Claudiu.Popa, Jim.Jewett, brett.cannon, dholth, merwok, python-dev
Priority: low Keywords: patch

Created on 2012-10-01 20:46 by dholth, last changed 2014-09-12 14:41 by Claudiu.Popa. This issue is now closed.

Files
File name Uploaded Description Edit
compileall_v1.patch Claudiu.Popa, 2013-12-10 12:30 review
issue16104.patch Claudiu.Popa, 2014-03-10 19:39 Remove trailing whitespaces review
issue16104_1.patch Claudiu.Popa, 2014-03-12 07:26 review
issue16104_2.patch Claudiu.Popa, 2014-03-12 21:26 Minor doc fixes. review
issue16104_3.patch Claudiu.Popa, 2014-03-12 22:08 review
issue16104_4.patch Claudiu.Popa, 2014-03-12 22:42 review
issue16104_5.patch Claudiu.Popa, 2014-03-13 16:53 review
issue16104_6.patch Claudiu.Popa, 2014-03-13 17:48 review
issue16104_7.patch Claudiu.Popa, 2014-03-13 19:18 Skip a couple of tests if concurrent.futures is unavailable review
issue16104_8.patch Claudiu.Popa, 2014-04-24 06:20 review
issue16104_9.patch Claudiu.Popa, 2014-04-27 13:00 review
issue16104_10.patch Claudiu.Popa, 2014-04-27 14:06 review
issue16104_11.patch Claudiu.Popa, 2014-04-27 16:13 Minor doc update review
issue16104_12.patch Claudiu.Popa, 2014-04-30 09:16 review
16104.patch Claudiu.Popa, 2014-09-10 06:58 Update the patch for the tip. review
Messages (27)
msg171744 - (view) Author: Daniel Holth (dholth) Date: 2012-10-01 20:46
compileall would benefit approximately linearly from additional CPU cores.  There should be an option.

The noisy output would have to change. Right now it prints "compiling" and then "done" synchronously with doing the actual work.
msg171758 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2012-10-01 23:16
This should probably use concurrent.futures instead of multiprocessing directly, but yes it would be useful.

Then again, the whole module should probably be rewritten to use importlib as well.
msg205805 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2013-12-10 12:30
Hello!

Here's a draft patch. It adds a new *processes* parameter to *compile_dir* and a new command line parameter as well.
msg213200 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2014-03-12 05:50
Patch looks good.  Some comments on Rietveld.
msg213209 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-12 07:26
Thank you for the review, Éric! Here's the updated patch.
msg213298 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2014-03-12 21:05
FTR, py_compile and compileall use importlib in 3.4.
msg213301 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2014-03-12 21:50
This looks ready to me.

One thing: “make -j0” is the spelling for “run using all available cores”, whereas “compileall -j0” will use one process.  I don’t know if this should be documented, changed or ignored.
msg213303 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-03-12 21:52
I vote for changed so that -j0 uses all available cores as os.cpu_count() states.
msg213304 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-12 21:53
I agree. I'll modify the patch.
msg213307 - (view) Author: Éric Araujo (merwok) * (Python committer) Date: 2014-03-12 22:11
+        if args.processes <= 0:
Is that correct?  For make, I think I’ve always seen “-j0”, not negative values.

Could you add a test for -j0? (i.e. check that “compileall -j0” calls the function with “processes=os.cpu_count()”)
msg213308 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-12 22:12
regrtest does that, checking for j <=0.
msg213317 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-12 22:42
Here's a test for j0 == os.cpu_count.
msg213340 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 00:28
Importing ProcessExecutor at the top-level means compileall will crash on systems which don't have multiprocessing support.
msg213417 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-13 16:53
Here's a new patch which addresses Éric's last comments.
Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?
msg213419 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-13 16:59
> Here's a new patch which addresses Éric's last comments.
> Antoine, I don't have at my disposal a system without multiprocessing support. How does it crash?

Neither do I, but you will probably get an ImportError of some sort.
msg213422 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-13 17:48
Here's a new version which catches ImportError for concurrent.futures and raises ValueError in `compile_dir` if `processes` was specified and concurrent.futures is unavailable. The only issue is that I don't know if this should be a ValueError or not. For instance, zipfile uses RuntimeError if `lzma` is unavailable.
msg214450 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-03-22 08:31
What can I do to move this forward? I believe all concerns have been addressed and it seems ready to me.
msg217118 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-04-24 06:20
Added a new version of the patch which incorporates suggestions made by Jim. Thanks for the review!
msg217173 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2014-04-25 21:43
ProcessPoolExecutor already defaults to using cpu_count if max_workers is None.  Consistency with that might be useful too.  (and a default of 1 to mean nothing in parallel is sensible...)
msg217261 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-04-27 13:00
Added a new patch with improvements suggested by Jim. Thanks!

I removed the handling of processes=1, because it can still be useful: having a background worker which processes the files received from _walk_dir. Also, it checks that compile_dir receives a positive *processes* value, otherwise it raises a ValueError. As a side note, I just found that ProcessPoolExecutor / ThreadPoolExecutor don't verify the value of processes, leading to certain types of errors (see issue21362 for more details).
Jim, the default for processes is still None, meaning "do not use multiple process", because the purpose of ProcessPoolExecutor makes it easy for it to use processes=None=os.cpu_count(). Here we want the user to be explicit about wanting multiple processes or not.
msg217264 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-04-27 14:06
Add new patch with fixes proposed by Berker Peksag. Thanks for the review. Hopefully, this is the last iteration of this patch.
msg217399 - (view) Author: Jim Jewett (Jim.Jewett) Date: 2014-04-28 19:20
Trying to put bounds on the disagreements.  Does anyone disagree with any of the following:

(1)  compileall currently runs single-threaded in a single process.


(2)  This enhancement intends to allow parallelization by process.


(3)  Users MAY need to express whether they (require/forbid/are expressly apathetic concerning) paralellization.

(3A)  There is some doubt that this even needs to be user-controlled.

(3B)  If it is user-controlled, the patch proposes adding a "processes" parameter to do this.

(3C)  There have been suggestions of other names (notably "workers"), but *if* it is user-controlled, the idea of a new parameter is not controversial.


(4)  Users MAY need to control the degree of parallelization.

(4A)  If so, setting the value of the new parameter to a positive integer > 1 is an acceptable solution.

(4B)  There is not yet consensus on how to represent "Use multi-processing, with the default degree for this system.", "Do NOT use multiprocessing.", or "I don't care."

(4C)  Suggested values have included 1, 0, -1, any negative number, None, and specific strings.  The precise mapping between some of these and the three cases of 4B is not agreed.


(5)  If multiprocessing is explicitly requested, what should happen when it is not available?

(5A)  Fall back to the current way, without multi-processing.

(5B)  Fall back to the current way, without multi-processing, but issue a Warning.

(5C)  Raise an Exception.  (ValueError, ImportError, NotImplemented?)


(6)  Portions of the documentation unrelated to this should be fixed.  But ideally, that would be done separately, and it will NOT be a pre-requisite to this patch.


---------------------------------------------------

Another potential value set

None (the default) ==> let the system parallelize as best it can -- as it does in multiprocessing.  If the system picks "not in parallel at all", that is also OK, and no warning is raised.

0 ==> Do not parallelize.

positive integers ==> Use that many processes.

negative ==> ValueError

Would these uses of 0 and negative be too surprising for someone?
msg217586 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-04-30 09:16
Updated patch according to the python-dev thread:

- processes renamed to workers
- `workers` defaults to 1
- When `workers` is equal to 0, then `os.cpu_count` will be used
- When `workers` > 1, multiple processes will be used
- When `workers` == 1, run normally (no multiple processes)
- Negative values raises a ValueError
- Will raise NotImplementedError if multiprocessing can't be used
(when `workers` equals to 0 or > 1)
msg226684 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-09-10 06:58
If there is nothing left to do for this patch, can it be committed?
msg226822 - (view) Author: Roundup Robot (python-dev) Date: 2014-09-12 14:40
New changeset 9efefcab817e by Brett Cannon in branch 'default':
Issue #16104: Allow compileall to do parallel bytecode compilation.
http://hg.python.org/cpython/rev/9efefcab817e
msg226823 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2014-09-12 14:40
Thanks for the patch, Claudiu!
msg226824 - (view) Author: Claudiu Popa (Claudiu.Popa) * Date: 2014-09-12 14:41
Thank you for committing it. :-)
History
Date User Action Args
2014-09-12 14:41:27Claudiu.Popasetmessages: + msg226824
2014-09-12 14:40:30brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg226823

stage: commit review -> resolved
2014-09-12 14:40:06python-devsetnosy: + python-dev
messages: + msg226822
2014-09-10 14:25:18brett.cannonsetassignee: brett.cannon
2014-09-10 06:58:58Claudiu.Popasetfiles: + 16104.patch

messages: + msg226684
2014-04-30 09:16:12Claudiu.Popasetfiles: + issue16104_12.patch

messages: + msg217586
2014-04-28 19:21:32pitrousetnosy: - pitrou
2014-04-28 19:20:44Jim.Jewettsetmessages: + msg217399
2014-04-27 16:13:19Claudiu.Popasetfiles: + issue16104_11.patch
2014-04-27 14:06:47Claudiu.Popasetfiles: + issue16104_10.patch

messages: + msg217264
2014-04-27 13:33:27steven.dapranosetnosy: - steven.daprano
2014-04-27 13:00:19Claudiu.Popasetfiles: + issue16104_9.patch

messages: + msg217261
2014-04-25 21:43:54Jim.Jewettsetnosy: + Jim.Jewett
messages: + msg217173
2014-04-24 06:20:06Claudiu.Popasetfiles: + issue16104_8.patch

messages: + msg217118
2014-04-23 23:01:34terry.reedysettitle: Use multiprocessing in compileall script -> Compileall script: add option to use multiple cores
2014-03-22 08:31:37Claudiu.Popasetmessages: + msg214450
2014-03-13 19:18:06Claudiu.Popasetfiles: + issue16104_7.patch
2014-03-13 17:48:33Claudiu.Popasetfiles: + issue16104_6.patch

messages: + msg213422
2014-03-13 16:59:15pitrousetmessages: + msg213419
2014-03-13 16:53:43Claudiu.Popasetfiles: + issue16104_5.patch

messages: + msg213417
2014-03-13 00:28:13pitrousetnosy: + pitrou
messages: + msg213340
2014-03-12 22:42:18Claudiu.Popasetfiles: + issue16104_4.patch

messages: + msg213317
2014-03-12 22:12:27Claudiu.Popasetmessages: + msg213308
2014-03-12 22:11:35merwoksetmessages: + msg213307
2014-03-12 22:08:05Claudiu.Popasetfiles: + issue16104_3.patch
2014-03-12 21:53:40Claudiu.Popasetmessages: + msg213304
2014-03-12 21:52:17brett.cannonsetmessages: + msg213303
2014-03-12 21:50:58merwoksetmessages: + msg213301
stage: patch review -> commit review
2014-03-12 21:26:01Claudiu.Popasetfiles: + issue16104_2.patch
2014-03-12 21:05:58merwoksetmessages: + msg213298
2014-03-12 07:26:02Claudiu.Popasetfiles: + issue16104_1.patch

messages: + msg213209
2014-03-12 05:50:28merwoksetstage: patch review
type: enhancement
versions: + Python 3.5, - Python 3.4
2014-03-12 05:50:10merwoksetnosy: + merwok
messages: + msg213200
2014-03-10 19:39:54Claudiu.Popasetfiles: + issue16104.patch
2013-12-10 12:30:35Claudiu.Popasetfiles: + compileall_v1.patch

nosy: + Claudiu.Popa
messages: + msg205805

keywords: + patch
2013-04-28 02:49:41brett.cannonsetassignee: brett.cannon -> (no value)
2013-03-26 18:18:17brett.cannonsetassignee: brett.cannon
2012-10-02 01:41:12steven.dapranosetnosy: + steven.daprano
2012-10-01 23:16:10brett.cannonsetpriority: normal -> low
versions: + Python 3.4
nosy: + brett.cannon

messages: + msg171758

components: + Library (Lib)
2012-10-01 20:46:54dholthcreate