Message 412131 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gregory.p.smith
Recipients	Mark.Shannon, eric.snow, gregory.p.smith, kumaraditya, lys.nikolaou, pablogsal, terry.reedy, tim.peters, vstinner, xtreak
Date	2022-01-30.07:49:31
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1643528971.96.0.342498363878.issue46524@roundup.psfhosted.org>
In-reply-to

Content
re: slow tests in the first half of the list. the same total amount of time is going to be spent regardless. In our test suite on a modern fast 16 thread system, all but 10 tests are completed in parallel within the first 30 seconds. The remaining ~10 take 10x+ that wall time more minutes. So the most latency you will shave off on a modern system is probably <30 seconds. On a slower system the magnitude of that will remain the same in proportion. CI systems are not workstations. On -j1 or -j2 system I doubt it will make a meaningful difference at all. Picture test execution as a utilization graph: ``` \|ttttttttttttttttttttttt \| tttt \| ttt \| tttttttttt +---------------------------------------- ``` The total area under that curve is going to remain the same no matter what so long as we execute everything. Reordering the tests can pull the final long tail in a bit by pushing out the top layer. You move more towards an optimal rectangle, but you're still limited by the area. The less -jN parallelism you have as CPU cores the less difference any reordering change makes. What actual parallelism do our Github CI systems offer? The fundamental problem is that we do a LOT in our test suite and have no concept of what depends on what and thus _needs_ to be run. So we run it all. For specialized tests like test_peg_generator and test_tools it should be easy to determine from a list of modified files if those tests are relevant. That gets a lot more complicated to accurately express for things like test_multiprocessing and test_concurrent_futures. test_peg_generator and test_tools are also packages of tests that themselves should be parallelized individually instead of considered a single serialized unit. At work we even shard test methods within TestCase classes so that big ones can be split across test executor tasks: See the _setup_sharding() function in absltest here: https://github.com/abseil/abseil-py/blob/main/absl/testing/absltest.py#L2368 In absence of implementing an approach like that within test.regrtest to shard at a more granular level thus enabling us to approach the golden rectangle of optimal parallel test latency, we're left with manually splitting long running test module/packages up into smaller units to achieve a similar effect.

re: slow tests in the first half of the list. the same total amount of time is going to be spent regardless. In our test suite on a modern fast 16 thread system, all but 10 tests are completed in parallel within the first 30 seconds. The remaining ~10 take 10x+ that wall time more minutes.

So the most latency you will shave off on a modern system is probably <30 seconds. On a slower system the magnitude of that will remain the same in proportion. CI systems are not workstations. On -j1 or -j2 system I doubt it will make a meaningful difference at all.

Picture test execution as a utilization graph:

```
|ttttttttttttttttttttttt
| tttt
| ttt
| tttttttttt
+----------------------------------------
```

The total area under that curve is going to remain the same no matter what so long as we execute everything. Reordering the tests can pull the final long tail in a bit by pushing out the top layer. You move more towards an optimal rectangle, but you're still limited by the area. **The less -jN parallelism you have as CPU cores the less difference any reordering change makes.**

What actual parallelism do our Github CI systems offer?

The fundamental problem is that we do a LOT in our test suite and have no concept of what depends on what and thus _needs_ to be run. So we run it all. For specialized tests like test_peg_generator and test_tools it should be easy to determine from a list of modified files if those tests are relevant.

That gets a lot more complicated to accurately express for things like test_multiprocessing and test_concurrent_futures.

test_peg_generator and test_tools are also *packages of tests* that themselves should be parallelized individually instead of considered a single serialized unit.

At work we even shard test methods within TestCase classes so that big ones can be split across test executor tasks: See the _setup_sharding() function in absltest here: https://github.com/abseil/abseil-py/blob/main/absl/testing/absltest.py#L2368

In absence of implementing an approach like that within test.regrtest to shard at a more granular level thus enabling us to approach the golden rectangle of optimal parallel test latency, we're left with manually splitting long running test module/packages up into smaller units to achieve a similar effect.

History
Date	User	Action	Args
2022-01-30 07:49:32	gregory.p.smith	set	recipients: + gregory.p.smith, tim.peters, terry.reedy, vstinner, Mark.Shannon, eric.snow, lys.nikolaou, pablogsal, xtreak, kumaraditya
2022-01-30 07:49:31	gregory.p.smith	set	messageid: <1643528971.96.0.342498363878.issue46524@roundup.psfhosted.org>
2022-01-30 07:49:31	gregory.p.smith	link	issue46524 messages
2022-01-30 07:49:31	gregory.p.smith	create