Message 380225 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	kj
Recipients	DanilZ, bquinlan, kj, ned.deily, pitrou, ronaldoussoren
Date	2020-11-02.14:34:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1604327661.86.0.437624799332.issue42245@roundup.psfhosted.org>
In-reply-to

Content
Hello, it would be great if you can you provide more details. Like your Operating System and version, how many logical CPU cores there are on your machine, and lastly the exact Python version with major and minor versions included (eg. Python 3.8.2). Multiprocessing behaves differently depending on those factors. FWIW I reduced your code down to make it easier to read, and removed all the unused variables: import concurrent.futures from sklearn.datasets import make_regression def just_print(): print('Just printing') def fit_model(): data = make_regression(n_samples=500, n_features=100, n_informative=10, n_targets=1, random_state=5) print('Fit complete') if __name__ == '__main__': with concurrent.futures.ProcessPoolExecutor() as executor: results_temp = [executor.submit(just_print) for i in range(0,12)] with concurrent.futures.ProcessPoolExecutor() as executor: results_temp = [executor.submit(fit_model) for i in range(0,12)] The problem is that I am unable to reproduce the bug you are reporting on Windows 10 64-bit, Python 3.7.6. The code runs till completion for both examples. I have a hunch that your problem lies elsewhere in one of the many libraries you imported. >>> Note: problem occurs only after performing the RandomizedSearchCV... Like you have noted, I went to skim through RandomizedSearchCV's source code and docs. RandomizedSearchCV is purportedly able to use multiprocessing backend for parallel tasks. By setting `n_jobs=-1` in your params, you're telling it to use all logical CPU cores. I'm unsure of how many additional processes and pools RandomizedSearchCV's spawns after calling it, but this sounds suspicious. concurrent.futures specifically warns that this may exhaust available workers and cause tasks to never complete. See https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor (the docs here are for ThreadPoolExecutor, but they still apply). A temporary workaround might be to reduce n_jobs OR even better: use scikit-learn's multiprocessing parallel backend that's dedicated for that, and should have the necessary protections in place against such behavior. https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend TLDR: I don't think this is a Python bug and I'm in favor of this bug being closed as `not a bug`.

Hello, it would be great if you can you provide more details. Like your Operating System and version, how many logical CPU cores there are on your machine, and lastly the exact Python version with major and minor versions included (eg. Python 3.8.2). Multiprocessing behaves differently depending on those factors.

FWIW I reduced your code down to make it easier to read, and removed all the unused variables:

import concurrent.futures
from sklearn.datasets import make_regression

def just_print():
    print('Just printing')

def fit_model():
    data = make_regression(n_samples=500, n_features=100, n_informative=10, n_targets=1, random_state=5)
    print('Fit complete')

if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(just_print) for i in range(0,12)]

    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(fit_model) for i in range(0,12)]

The problem is that I am *unable* to reproduce the bug you are reporting on Windows 10 64-bit, Python 3.7.6. The code runs till completion for both examples. I have a hunch that your problem lies elsewhere in one of the many libraries you imported.

>>> Note: problem occurs only after performing the RandomizedSearchCV...

Like you have noted, I went to skim through RandomizedSearchCV's source code and docs. RandomizedSearchCV is purportedly able to use multiprocessing backend for parallel tasks. By setting `n_jobs=-1` in your params, you're telling it to use all logical CPU cores. I'm unsure of how many additional processes and pools RandomizedSearchCV's spawns after calling it, but this sounds suspicious. concurrent.futures specifically warns that this may exhaust available workers and cause tasks to never complete. See https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor (the docs here are for ThreadPoolExecutor, but they still apply).

A temporary workaround might be to reduce n_jobs OR even better: use scikit-learn's multiprocessing parallel backend that's dedicated for that, and should have the necessary protections in place against such behavior. https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend 


TLDR: I don't think this is a Python bug and I'm in favor of this bug being closed as `not a bug`.

History
Date	User	Action	Args
2020-11-02 14:34:21	kj	set	recipients: + kj, bquinlan, ronaldoussoren, pitrou, ned.deily, DanilZ
2020-11-02 14:34:21	kj	set	messageid: <1604327661.86.0.437624799332.issue42245@roundup.psfhosted.org>
2020-11-02 14:34:21	kj	link	issue42245 messages
2020-11-02 14:34:21	kj	create