Sorry, I did not notice that there is a C implementation in PR 18427. Changes in the Python implementations are so larger that I though this is the goal of the PR.

Often the most clear and efficient way to implement an iterator in Python is to write a generator function. In C you need to write a class with the __next__ method, but Python has better way.

I have tested your first example with the Python implementation and got 93.9 msec on master vs 314 msec with PR 18427 applied. It is expected that the C implementation is faster than the Python implementation, but was there a need to make the Python implementation 3 times slower?
