This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: concurrent.futures.ProcessPoolExecutor state=finished raised error
Type: crash Stage: resolved
Components: 2to3 (2.x to 3.x conversion tool) Versions: Python 3.7
process
Status: closed Resolution: duplicate
Dependencies: Superseder: function changed when pickle bound method object
View: 37297
Assigned To: Nosy List: DanilZ, aeros, asvetlov, bquinlan, lukasz.langa, maggyero, methane, ned.deily, pitrou, serhiy.storchaka
Priority: normal Keywords:

Created on 2019-06-15 17:49 by maggyero, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (9)
msg345709 - (view) Author: Géry (maggyero) * Date: 2019-06-15 17:49
The following code hangs forever instead of printing "called" 10 times:

    from concurrent.futures import ProcessPoolExecutor
    
    class A:
        def f(self):
            print("called")
    
    class B(A):
        def f(self):
            executor = ProcessPoolExecutor(max_workers=2)
            futures = [executor.submit(super(B, self).f)
                       for _ in range(10)]
    
    if __name__ == "__main__":
        B().f()

The same code with `super(B, self)` replaced with `super()` raises the following error:

> TypeError: super(type, obj): obj must be an instance or subtype of type

However, replacing `ProcessPoolExecutor` with `ThreadPoolExecutor` works  as expected, but only with `super(B, self)` (with `super()` it still raises the same error).
msg345714 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2019-06-15 19:58
The use case is weird.
I don't think we need to do something with the issue.
msg345716 - (view) Author: Géry (maggyero) * Date: 2019-06-15 20:56
@Andrew Svetlov

Well this was just an example for illustrating the issue. In real code I need this to create `ThreadPoolMixin` and `ProcessPoolMixin` classes (similar to the `socketserver.ThreadingMixIn` and `socketserver.ForkingMixIn` classes in the standard library, but using a thread/process pool with a fixed size instead of creating a new thread/process per request) for mixing with my server classes. But because of this `ProcessPoolExecutor` issue, I cannot use my `ProcessPoolMixin` class but only my `ThreadPoolMixin` currently.

The fact is one cannot submit a parent method call to a `ProcessPoolExecutor`.

This code snippet prints the exception raised by the call:

from concurrent.futures import ProcessPoolExecutor

class A:
    def f(self):
        print("called")

class B(A):
    def f(self):
        executor = ProcessPoolExecutor(max_workers=2)
        print(executor.submit(super().f).exception())

if __name__ == "__main__":
    B().f()

It prints this:

> [Errno 24] Too many open files
> None
> None
> None
> …
msg345717 - (view) Author: Géry (maggyero) * Date: 2019-06-15 22:21
And like the `concurrent.futures` module for `concurrent.futures.ProcessPoolExecutor` but not for `concurrent.futures.ThreadPoolExecutor` (see above), the `multiprocessing.pool` module seems also affected by a similar problem for `multiprocessing.pool.Pool` (process pools) but not for `multiprocessing.pool.ThreadPool` (thread pools).

Indeed the following code:

    import multiprocessing.pool
    
    class A:
        def f(self):
            print("called")
    
    class B(A):
        def f(self):
            pool = multiprocessing.pool.Pool(2)
            pool.apply(super().f)
    
    if __name__ == "__main__":
        B().f()

raises the following exception:

> AssertionError: daemonic processes are not allowed to have children
msg345777 - (view) Author: Géry (maggyero) * Date: 2019-06-16 20:31
George Xie found the root cause of this issue (a bug in the function method_reduce in cpython/Objects/classobject.c):

https://stackoverflow.com/questions/56609847/why-do-concurrent-futures-processpoolexecutor-and-multiprocessing-pool-pool-fail/56614748#56614748

and he filed the specific bug here:

https://bugs.python.org/issue37297

The issue looks serious as it creates an infinite recursion that crashed my laptop several times today.
msg377610 - (view) Author: DanilZ (DanilZ) Date: 2020-09-28 18:00
After executing a single task inside a process the result is returned with state=finished raised error.

Error happens when trying to load a big dataset (over 5 GB). Otherwise the same dataset reduced to a smaller nrows executes and returns from result() without errors.

with concurrent.futures.ProcessPoolExecutor(max_workers = 1) as executor:
    results = executor.submit(pd.read_csv, path)

data = results.result()
msg377727 - (view) Author: Kyle Stanley (aeros) * (Python committer) Date: 2020-10-01 00:11
DanilZ, could you take a look at the superseding issue (https://bugs.python.org/issue37297) and see if your exception raised within the job is the same?  

If it's not, I would suggest opening a separate issue (and linking to it in a comment here), as I don't think it's necessarily related to this one. "state=finished raised error" doesn't indicate the specific exception that occurred. A good format for the name would be something along the lines of:

"ProcessPoolExecutor.submit() <specific exception name here> while reading large object (4GB)"

It'd also be helpful in the separate issue to paste the full exception stack trace, specify OS, and multiprocessing start method used (spawn, fork, or forkserver). This is necessary to know for replicating the issue on our end.

In the meantime, I workaround I would suggest trying would be to use the  *chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it across several jobs (at least 4+, more if you have additional cores) instead of within a single one. It'd also be generally helpful to see if that alleviates the problem, as it could possibly indicate an issue with running out of memory when the dataframe is converted to pickle format (which often increases the total size) within the process associated with the job.
msg377738 - (view) Author: DanilZ (DanilZ) Date: 2020-10-01 07:29
.

> On 1 Oct 2020, at 03:11, Kyle Stanley <report@bugs.python.org> wrote:
> 
> 
> Kyle Stanley <aeros167@gmail.com> added the comment:
> 
> DanilZ, could you take a look at the superseding issue (https://bugs.python.org/issue37297) and see if your exception raised within the job is the same?  
> 
> If it's not, I would suggest opening a separate issue (and linking to it in a comment here), as I don't think it's necessarily related to this one. "state=finished raised error" doesn't indicate the specific exception that occurred. A good format for the name would be something along the lines of:
> 
> "ProcessPoolExecutor.submit() <specific exception name here> while reading large object (4GB)"
> 
> It'd also be helpful in the separate issue to paste the full exception stack trace, specify OS, and multiprocessing start method used (spawn, fork, or forkserver). This is necessary to know for replicating the issue on our end.
> 
> In the meantime, I workaround I would suggest trying would be to use the  *chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it across several jobs (at least 4+, more if you have additional cores) instead of within a single one. It'd also be generally helpful to see if that alleviates the problem, as it could possibly indicate an issue with running out of memory when the dataframe is converted to pickle format (which often increases the total size) within the process associated with the job.
> 
> ----------
> nosy: +aeros
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue37294>
> _______________________________________
msg377739 - (view) Author: DanilZ (DanilZ) Date: 2020-10-01 07:38
I think you have correctly estimated the problem in the last part of your message: "as it could possibly indicate an issue with running out of memory when the dataframe is converted to pickle format (which often increases the total size) within the process associated with the job”

The function pd.read_csv performs without any problems inside a process, the error appears only when I try to extract it from the finished process via:
    for f in concurrent.futures.as_completed(results):
            data = f.result()

or

    data = results.result()

It just does not pass a large file from the results object.

I am sure that inside of a multiprocess everything works correctly for 2 reasons:
1. If I change in function inside a process to just save the file (that had been read in memory) to disk.
2. If I recuse the file size, then it gets extracted from results.result() without error.

So I guess then that my question narrows down to: 
1. Can I increase the memory allocated to a process? 
2. Or at least understand what would is the limit.

Regards,
Danil

> On 1 Oct 2020, at 03:11, Kyle Stanley <report@bugs.python.org> wrote:
> 
> 
> Kyle Stanley <aeros167@gmail.com> added the comment:
> 
> DanilZ, could you take a look at the superseding issue (https://bugs.python.org/issue37297) and see if your exception raised within the job is the same?  
> 
> If it's not, I would suggest opening a separate issue (and linking to it in a comment here), as I don't think it's necessarily related to this one. "state=finished raised error" doesn't indicate the specific exception that occurred. A good format for the name would be something along the lines of:
> 
> "ProcessPoolExecutor.submit() <specific exception name here> while reading large object (4GB)"
> 
> It'd also be helpful in the separate issue to paste the full exception stack trace, specify OS, and multiprocessing start method used (spawn, fork, or forkserver). This is necessary to know for replicating the issue on our end.
> 
> In the meantime, I workaround I would suggest trying would be to use the  *chunksize* parameter (or *Iterator*) in pandas.read_csv(), and split it across several jobs (at least 4+, more if you have additional cores) instead of within a single one. It'd also be generally helpful to see if that alleviates the problem, as it could possibly indicate an issue with running out of memory when the dataframe is converted to pickle format (which often increases the total size) within the process associated with the job.
> 
> ----------
> nosy: +aeros
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue37294>
> _______________________________________
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81475
2020-10-01 07:38:09DanilZsetmessages: + msg377739
2020-10-01 07:29:13DanilZsetmessages: + msg377738
2020-10-01 00:11:41aerossetnosy: + aeros
messages: + msg377727
2020-09-28 18:00:20DanilZsettitle: concurrent.futures.ProcessPoolExecutor and multiprocessing.pool.Pool fail with super -> concurrent.futures.ProcessPoolExecutor state=finished raised error
nosy: + DanilZ

messages: + msg377610

components: + 2to3 (2.x to 3.x conversion tool), - Library (Lib)
2019-06-28 18:50:54bquinlansetstatus: open -> closed
superseder: function changed when pickle bound method object
resolution: duplicate
stage: resolved
2019-06-16 20:31:45maggyerosetmessages: + msg345777
2019-06-15 22:21:15maggyerosetmessages: + msg345717
title: ProcessPoolExecutor fails with super -> concurrent.futures.ProcessPoolExecutor and multiprocessing.pool.Pool fail with super
2019-06-15 20:56:13maggyerosetmessages: + msg345716
2019-06-15 19:58:59asvetlovsetnosy: - gvanrossum
2019-06-15 19:58:30asvetlovsetnosy: + gvanrossum
messages: + msg345714
2019-06-15 19:53:38gvanrossumsetnosy: - gvanrossum
2019-06-15 17:52:54maggyerosetnosy: + gvanrossum
2019-06-15 17:49:58maggyerocreate