Thank you for your great work Ma Lin! But it will take a time to make a review of it.

Could you please create and run some microbenchmarks to measure possible performance penalty of additional MARH_PUSHes? I am especially interesting in worst cases. If the penalty is significant, it will be a goal of future optimizations. If it is unsignificant, we will not be bothered about this.

I am not sure about backporting these changes. This behavior is such old, that there is a chance to break someone's code that depend on it.
