Message402065
FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU caches as well and one of the things I experimented with at the time was rearranging the opcodes based on how often they were used and splitting the whole switch statement we had back then in two parts. This results in a 10-20% speedup.
CPU caches have since gotten much larger, but the size of the loop still is something to keep in mind and optimize for, as more and more logic gets added to the inner loop of Python.
IMO, we should definitely keep forced inlines / macros where they are used inside hot loops, perhaps even in all of the CPython code, since the conversion to inline functions is mostly for hiding internals from extensions, not to hide them from CPython itself.
@neonene: Could you provide more details about the CPU you're using to run the tests ?
BTW: Perhaps the PSF could get a few sponsors to add more hosts to speed.python.org, to provide a better overview. It looks as if the system is only compiling on Ubuntu 14.04 and running on an 11 year old system (https://speed.python.org/about/). If that's the case, the system uses a server CPU with 12MB cache (https://www.intel.com/content/www/us/en/products/sku/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi/specifications.html). |
|
Date |
User |
Action |
Args |
2021-09-17 16:36:16 | lemburg | set | recipients:
+ lemburg, rhettinger, paul.moore, vstinner, tim.golden, Mark.Shannon, zach.ware, steve.dower, malin, pablogsal, neonene, kj |
2021-09-17 16:36:16 | lemburg | set | messageid: <1631896576.29.0.624806786459.issue45116@roundup.psfhosted.org> |
2021-09-17 16:36:16 | lemburg | link | issue45116 messages |
2021-09-17 16:36:16 | lemburg | create | |
|