I reviewed print-fastcall.patch: LGTM, but I proposed a minor change.

Serhiy Storchaka: "The performance of print() is not critical. It usually involves slow formatting and IO."

I also had the same understanding of print(), but I just analyzed performances of the bm_telco benchmark, and it seems just like handling function parameters of print() take 20% of the runtime!?

bm_telco reference (unpatched) => with issue #29259 tp_fastcall-2.patch and print-fastcall.patch:

   20.9 ms +- 0.5 ms => 16.7 ms +- 0.2 ms

print-fastcall.patch makes bm_telco 20% faster! Just to make sure, I ran again bm_telco only with tp_fastcall-2.patch:

   telco: Median +- std dev: 21.4 ms +- 0.8 ms

Maybe we should optimize _PyStack_AsDict(), but that's a different topic ;-)
