New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up hash(fractions.Fraction) #82044
Comments
Recording before I forget. These are easy:
None of those should change any computed results. |
I make a quick PR for you. Skipped #1 because I don't think Fraction hashing is worth adding another slot. |
I wouldn't have bet on this, before seeing Raymond's benchmark results. Writing a fast path for invmod for C-size integers is still on my to-do list; the current implementation does way too many Python-level divisions. |
Why I expected a major speedup from this: the binary exponentiation routine (for "reasonably small" exponents) does 30 * ceiling(exponent.bit_length() / 30) multiply-and-reduces, plus another for each bit set in the exponent. That's a major difference between exponents of bit lengths 61 ((P-2).bit_length()) and 1 ((1).bit_length()). Indeed, I bet it would pay in |
Right, but that's stacked up against the cost of the extended Euclidean algorithm for computing the inverse. The extended gcd for computing the inverse of 1425089352415399815 (for example) modulo 2**61 - 1 takes 69 steps, each one of which involves a PyLong quotient-and-remainder division, a PyLong multiplication and a subtraction. So that's at least the same order of magnitude when it comes to number of operations. I'd bet that a dedicated pure C square-and-multiply algorithm (with an addition chain specifically chosen for the target modulus, and with the multiplication and reduction specialised for the particular form of the modulus) would still be the fastest way to go here. I believe optimal addition chains for 2**31-3 are known, and it shouldn't be too hard to find something close-to-optimal (as opposed to proved optimal) for 2**61-3. |
Agreed. |
Well, details matter ;-) Division in Python is expensive. In the exponentiation algorithm each reduction (in general) requires a 122-by-61 bit division. In egcd, after it gets going nothing exceeds 61 bits, and across iterations the inputs to the division step get smaller each time around. So, e.g., when Raymond tried a Fraction with denominator 5813, "almost all" the egcd divisions involved inputs each with a single internal Python "digit". But "almost all" the raise-to-the-(P-2) divisions involve a numerator with 5 internal digts and 3 in the denominator. Big difference, even if the total number of divisions is about the same. |
Mark, I did just a little browsing on this. It seems it's well known that egcd beats straightforward exponentiation for this purpose in arbitrary precision contexts, for reasons already sketched (egcd needs narrower arithmetic from the start, benefits from the division narrowing on each iteration, and since the quotient on each iteration usually fits in a single "digit" the multiplication by the quotient goes fast too). But gonzo implementations switch back to exponentiation, using fancier primitives like Montgomery multiplication. As usual, I'm not keen on bloating the code for "state of the art" giant int algorithms, but suit yourself! The focus in this PR is dead simple spelling changes with relatively massive payoffs. |
There's likely more that could be done -- I've just taken the low hanging fruit. If someone wants to re-open this and go farther, please take it from here. |
Some random notes:
That is, express P-2 as its prime factorization. There are 28 zero bits in the larger factor, so would save 28 multiply steps right there. Don't know how far that may yet be from an optimal addition chain for P-2.
Which of those do I intend to pursue? Probably none :-( But I confess to be _annoyed_ at giving up on extending binary gcd to compute an inverse, so I may at least do that in Python before I die ;-) |
For posterity:
""" Lars considered a great many algorithm variations, and wrote up about ten of the best. Several things surprised me here. Most surprising: under some realistic assumptions, the best turned out to be a relatively simple variation of Euclid extended GCD. Nutshell: during a step, let the difference between the bit lengths of the then-current numerator and denominator be This has a larger expected iteration count than some other methods, but, like regular old Euclid GCD, needs relatively little code per iteration. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: