Title: int.__repr__() is slower than repr()
Following the StackOverflow question [1].

Calling repr() is faster than calling unbound method __repr__(). This looks strange at first glance because it is *obvious* that repr() is implemented via calling __repr__().

$ ./python -m timeit "''.join(map(repr, range(10000)))"
500 loops, best of 5: 809 usec per loop

$ ./python -m timeit "''.join(map(int.__repr__, range(10000)))"
200 loops, best of 5: 1.27 msec per loop

Actually repr() just called the tp_repr slot, while calling int.__repr__ passes through many intermediate layers.

Proposed PR gets rid of a half of the overhead. It avoids creating and calling an itermediate function object. The result still is slower then calling repr().

$ ./python -m timeit "''.join(map(int.__repr__, range(10000)))"
200 loops, best of 5: 1.01 msec per loop

The PR also speeds up calling classmethod descriptors.

$ ./python -m timeit -s "cm = bytes.fromhex; args = [('',)]*10000; from itertools import starmap" -- "b''.join(starmap(cm, args))"
500 loops, best of 5: 515 usec per loop

$ ./python -m timeit -s "cm = bytes.__dict__['fromhex']; args = [(bytes, '')]*10000; from itertools import starmap" -- "b''.join(starmap(cm, args))"
500 loops, best of 5: 704 usec per loop


$ ./python -m timeit -s "cm = bytes.__dict__['fromhex']; args = [(bytes, '')]*10000; from itertools import starmap" -- "b''.join(starmap(cm, args))"
500 loops, best of 5: 598 usec per loop

New changeset 5e02c7826f9797fb3add79b608ef51f7a62b3e5a by Serhiy Storchaka in branch 'master':
bpo-31410: Optimized calling wrapper and classmethod descriptors. (#3481)
Oh, nice optimization!

I see that you reused the _PyMethodDef_RawFastCallDict() function that I added exactly for the same reason: prevent the creation of a temporary C function only created for a single call and then destroyed.
