New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make statistics.median_grouped more efficient #70190
Comments
statistics.median_grouped currently uses cf=data.index(x) to find the leftmost position of x in data ( line number 445 ). Here, data.index() has linear time complexity, which may not be good for long lists. But, here since the list 'data' is sorted beforehand, we can use binary search ( bisect_left() ) to find the leftmost occurrence of 'x' in 'data'. Similarly, for counting number of occurrence of 'x' in data after sorting, we can find the position of rightmost occurrence of 'x' in data[l1....len(data)], where l1 is position of leftmost occurrence of 'x' (line number 447 ) |
Can someone please review my patch? I think it can increase the performance significantly of the median_grouped() function. But, I don't have any standard way of checking if it would improve or will be an overkill for the median_grouped() function? |
This code looks good and will certainly reduce the number of comparisons in non-trivial cases. FWIW, It looks like the index functions are lifted directly from the bisect docs. Steven, any objections? |
Yeah, I slightly modified the functions from the bisect docs to suit for the purpose here. |
Looks good to me. I've run some quick timing tests, and for very small lists, there's no significant difference, but for larger lists I'm getting up to a 50% speedup. Nicely done, thank you. |
New changeset 7b2fafd78c1d by Steven D'Aprano in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: