I didn't say "let's not do it".
I just want to focus on pure Python implementation at this issue,
because this thread is too long already.
Feel free to open new issue about C implementation.

Even if C implementation is added later, pure Python optimization
can boost PyPy performance. (
