I've added tests for this behavior by un-sorting the test inputs for test_find_tests, and adding comments that the results should be sorted for reliable test execution.

Attaching an updated patch.
