> does the call/return really add that much overhead?

Yes, it does.

$ python3.9 -m timeit -s 'x=3.5' 'x**2'
5000000 loops, best of 5: 63.5 nsec per loop
$ python3.9 -m timeit -s 'x=3.5' -s 'f=lambda x: x**2' 'f(x)'
2000000 loops, best of 5: 136 nsec per loop
