Message 402065 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Mark.Shannon, kj, lemburg, malin, neonene, pablogsal, paul.moore, rhettinger, steve.dower, tim.golden, vstinner, zach.ware
Date	2021-09-17.16:36:16
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1631896576.29.0.624806786459.issue45116@roundup.psfhosted.org>
In-reply-to

Content
FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU caches as well and one of the things I experimented with at the time was rearranging the opcodes based on how often they were used and splitting the whole switch statement we had back then in two parts. This results in a 10-20% speedup. CPU caches have since gotten much larger, but the size of the loop still is something to keep in mind and optimize for, as more and more logic gets added to the inner loop of Python. IMO, we should definitely keep forced inlines / macros where they are used inside hot loops, perhaps even in all of the CPython code, since the conversion to inline functions is mostly for hiding internals from extensions, not to hide them from CPython itself. @neonene: Could you provide more details about the CPU you're using to run the tests ? BTW: Perhaps the PSF could get a few sponsors to add more hosts to speed.python.org, to provide a better overview. It looks as if the system is only compiling on Ubuntu 14.04 and running on an 11 year old system (https://speed.python.org/about/). If that's the case, the system uses a server CPU with 12MB cache (https://www.intel.com/content/www/us/en/products/sku/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi/specifications.html).

FWIW: Back in the days of Python 1.5.2, the ceval loop was too big for CPU caches as well and one of the things I experimented with at the time was rearranging the opcodes based on how often they were used and splitting the whole switch statement we had back then in two parts. This results in a 10-20% speedup.

CPU caches have since gotten much larger, but the size of the loop still is something to keep in mind and optimize for, as more and more logic gets added to the inner loop of Python.

IMO, we should definitely keep forced inlines / macros where they are used inside hot loops, perhaps even in all of the CPython code, since the conversion to inline functions is mostly for hiding internals from extensions, not to hide them from CPython itself.

@neonene: Could you provide more details about the CPU you're using to run the tests ?

BTW: Perhaps the PSF could get a few sponsors to add more hosts to speed.python.org, to provide a better overview. It looks as if the system is only compiling on Ubuntu 14.04 and running on an 11 year old system (https://speed.python.org/about/). If that's the case, the system uses a server CPU with 12MB cache (https://www.intel.com/content/www/us/en/products/sku/47916/intel-xeon-processor-x5680-12m-cache-3-33-ghz-6-40-gts-intel-qpi/specifications.html).

History
Date	User	Action	Args
2021-09-17 16:36:16	lemburg	set	recipients: + lemburg, rhettinger, paul.moore, vstinner, tim.golden, Mark.Shannon, zach.ware, steve.dower, malin, pablogsal, neonene, kj
2021-09-17 16:36:16	lemburg	set	messageid: <1631896576.29.0.624806786459.issue45116@roundup.psfhosted.org>
2021-09-17 16:36:16	lemburg	link	issue45116 messages
2021-09-17 16:36:16	lemburg	create