I'm not sure that it is worth to apply this optimization. The patch adds half a hundred lines of complex code for only 80 ns benefit. On my computer just incrementing an integer takes 100 ns.
