> ./python -m pyperf timeit "from functools import lru_cache; f = lru_cache(lambda: 42)" "f()" --compare-to ../3.9/python
> /home/pablogsal/github/3.9/python: ..................... 2.60 us +- 0.05 us

You misused pyperf timeit: the two statements are run at each iteration of the benchmark.

I rerun the benchmark on Linux with PGO+LTO and CPU isolation:

Mean +- std dev: [py39] 37.5 ns +- 1.0 ns -> [master] 43.2 ns +- 0.7 ns: 1.15x slower

I understand that adding get_functools_state_by_type() has a cost of +5.7 ns on the performance of functions decorated with @lru_cache.

I used the commands:

./configure --enable-optimizations --with-lto
./python -m venv env
env/bin/python -m pip install pyperf
./env/bin/python -m pyperf timeit -s "from functools import lru_cache; f = lru_cache(lambda: 42)" "f()" -o master.json -v --duplicate=4096
