Author kj
Recipients DanilZ, bquinlan, kj, ned.deily, pitrou, ronaldoussoren
Date 2020-11-02.14:34:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1604327661.86.0.437624799332.issue42245@roundup.psfhosted.org>
In-reply-to
Content
Hello, it would be great if you can you provide more details. Like your Operating System and version, how many logical CPU cores there are on your machine, and lastly the exact Python version with major and minor versions included (eg. Python 3.8.2). Multiprocessing behaves differently depending on those factors.

FWIW I reduced your code down to make it easier to read, and removed all the unused variables:

import concurrent.futures
from sklearn.datasets import make_regression

def just_print():
    print('Just printing')

def fit_model():
    data = make_regression(n_samples=500, n_features=100, n_informative=10, n_targets=1, random_state=5)
    print('Fit complete')

if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(just_print) for i in range(0,12)]

    with concurrent.futures.ProcessPoolExecutor() as executor:
        results_temp = [executor.submit(fit_model) for i in range(0,12)]

The problem is that I am *unable* to reproduce the bug you are reporting on Windows 10 64-bit, Python 3.7.6. The code runs till completion for both examples. I have a hunch that your problem lies elsewhere in one of the many libraries you imported.

>>> Note: problem occurs only after performing the RandomizedSearchCV...

Like you have noted, I went to skim through RandomizedSearchCV's source code and docs. RandomizedSearchCV is purportedly able to use multiprocessing backend for parallel tasks. By setting `n_jobs=-1` in your params, you're telling it to use all logical CPU cores. I'm unsure of how many additional processes and pools RandomizedSearchCV's spawns after calling it, but this sounds suspicious. concurrent.futures specifically warns that this may exhaust available workers and cause tasks to never complete. See https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor (the docs here are for ThreadPoolExecutor, but they still apply).

A temporary workaround might be to reduce n_jobs OR even better: use scikit-learn's multiprocessing parallel backend that's dedicated for that, and should have the necessary protections in place against such behavior. https://joblib.readthedocs.io/en/latest/parallel.html#joblib.parallel_backend 


TLDR: I don't think this is a Python bug and I'm in favor of this bug being closed as `not a bug`.
History
Date User Action Args
2020-11-02 14:34:21kjsetrecipients: + kj, bquinlan, ronaldoussoren, pitrou, ned.deily, DanilZ
2020-11-02 14:34:21kjsetmessageid: <1604327661.86.0.437624799332.issue42245@roundup.psfhosted.org>
2020-11-02 14:34:21kjlinkissue42245 messages
2020-11-02 14:34:21kjcreate