Yes, I used --enable-optimization this time.
But my patch is not good for branch prediction of my CPU in this time.
I'm willing Object/call.c solves such placement issue.

BTW, since benefit of GetMethod is small, how about this?

* Add _PyMethod_FastCallKeywords
* Call it from _PyObject_FastCall*

_PyObject_FastCall* can use FASTCALL C function and method (PyCFunction),
and Python function (PyFunction).
Python method (PyMethod) is last common callable PyObject_FastCall* can't use FASTCALL.
