Message 356303 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	aeros
Recipients	aeros, asvetlov, primal, yselivanov
Date	2019-11-09.16:43:26
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1573317807.89.0.0143608860556.issue32309@roundup.psfhosted.org>
In-reply-to

Content
> (a) design the API correctly; (b) ship something that definitely works with a proven ThreadPoolExecutor; (c) write lots of tests; (d) write docs; (e) if (a-d) are OK, refine the implementation later by replacing ThreadPoolExecutor with a proper (eager threads creation) implementation. That sounds like a good strategy. I'll start working on step a, to build a more robust working implementation (that only uses ThreadPoolExecutor's public API), and then work on c and d once the API is approved. > 2. Not using ThreadPoolExecutor at all. We can write our own threads orchestration code, it's not that complicated. If we do that, our implementation will become quite faster than the current run_in_executor. We can use curio as inspiration. Last time I profiled asyncio I saw that the code binding concurrent.Future to asyncio.Future is quite complex, brittle, and slow. > I'm in favor of (2), but let's go through (a-d) steps to get there. Agreed, I'm also in favor of option (2), but doing (a-d) first. I think this approach will provide far more stability (rather than implementing (2) immediately), as we'll be able to write extensive test coverage using a ThreadPoolExecutor implementation as a base, and then ensure the native asyncio threadpool implementation has the same intended behavior. Afterwards, could even keep the ThreadPoolExecutor version as private for testing purposes. The native asyncio version will likely require some additional tests to ensure that the threads are being spawned eagerly, but they should have very similar overall behavior, with the asyncio version having better performance. I'm thinking that we could create a new Lib/asyncio/pools.py, for the initial ThreadPoolExecutor implementation, to give us a separate area to work with for native asyncio one in the future. A similar asyncio.ProcessPool API could also be eventually created there as well. It might be feasible to fit the initial implementation in Lib/asyncio/base_events.py, but IMO the native asyncio version would not fit. (From the user end it would make no difference, as long as we add pools.__all__ to Lib/asyncio/__init__.py) My only remaining question that I can think of: should we implement an asyncio.ProcessPool API (using ProcessPoolExecutor's public API) before working on the native asyncio version of asyncio.ThreadPool? This would likely allow us to release the executor versions well before the 3.9 beta (2020-05-18), and then the native asyncio versions (with eager spawning) either before 3.9rc1 (2020-08-10) or at some point during 3.10 alpha. I suspect that writing the extensive test coverage will take the most time.

> (a) design the API correctly; 
(b) ship something that definitely works with a proven ThreadPoolExecutor; 
(c) write lots of tests;
(d) write docs;
(e) if (a-d) are OK, refine the implementation later by replacing ThreadPoolExecutor with a proper (eager threads creation) implementation.

That sounds like a good strategy. I'll start working on step a, to build a more robust working implementation (that only uses ThreadPoolExecutor's public API), and then work on c and d once the API is approved. 

> 2. Not using ThreadPoolExecutor at all. We can write our own threads orchestration code, it's not that complicated. If we do that, our implementation will become quite faster than the current run_in_executor.  We can use curio as inspiration.  Last time I profiled asyncio I saw that the code binding concurrent.Future to asyncio.Future is quite complex, brittle, and slow.

> I'm in favor of (2), but let's go through (a-d) steps to get there.

Agreed, I'm also in favor of option (2), but doing (a-d) first. I think this approach will provide far more stability (rather than implementing (2) immediately), as we'll be able to write extensive test coverage using a ThreadPoolExecutor implementation as a base, and then ensure the native asyncio threadpool implementation has the same intended behavior. Afterwards, could even keep the ThreadPoolExecutor version as private for testing purposes. 

The native asyncio version will likely require some additional tests to ensure that the threads are being spawned eagerly, but they should have very similar overall behavior, with the asyncio version having better performance. 

I'm thinking that we could create a new Lib/asyncio/pools.py, for the initial ThreadPoolExecutor implementation, to give us a separate area to work with for native asyncio one in the future. A similar asyncio.ProcessPool API could also be eventually created there as well. It might be feasible to fit the initial implementation in Lib/asyncio/base_events.py, but IMO the native asyncio version would not fit. 

(From the user end it would make no difference, as long as we add pools.__all__ to Lib/asyncio/__init__.py)

My only remaining question that I can think of: should we implement an asyncio.ProcessPool API (using ProcessPoolExecutor's public API) before working on the native asyncio version of asyncio.ThreadPool? 

This would likely allow us to release the executor versions well before the 3.9 beta (2020-05-18), and then the native asyncio versions (with eager spawning) either before 3.9rc1 (2020-08-10) or at some point during 3.10 alpha. I suspect that writing the extensive test coverage will take the most time.

History
Date	User	Action	Args
2019-11-09 16:43:27	aeros	set	recipients: + aeros, asvetlov, yselivanov, primal
2019-11-09 16:43:27	aeros	set	messageid: <1573317807.89.0.0143608860556.issue32309@roundup.psfhosted.org>
2019-11-09 16:43:27	aeros	link	issue32309 messages
2019-11-09 16:43:26	aeros	create