New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster long multiplication #48194
Comments
In this patch x_mul(a, b) uses fewer bit operations for a != b, |
Just to be clear: this patch halves the number of shifts and masks, Perhaps this is an argument for considering changing PyLong_SHIFT to 16 While I believe the idea is sound, the patch isn't quite correct---it's >>> a = b = 2**30-1
[36413 refs]
>>> a*b
Assertion failed: (carry <= PyLong_MASK), function x_mul, file
Objects/longobject.c, line 2228.
Abort trap I'm fairly sure that the assert(carry <= PyLong_MASK) doesn't matter,
|
Yes, I think that the speed-up is due to reducing the number of Changing PyLong_SHIFT to 16 would be complicated; for instance in I did not modify the case a = b. I changed the documentation, which was wrong, |
Thanks for the updated patch! Looks good, on a quick scan. (One comment typo I noticed: there's a line Just out of interest, is it possible to go further, and combine 4 I think it's important to make sure that any changes to longobject.c Re: possible changes to PyLong_SHIFT Yes, changing PyLong_SHIFT to 16 (or 32) would be complicated, and would Changing PyLong_SHIFT to 30 doesn't seem like a totally ridiculous idea, On 64-bit machines one could presumably go further, and have Any of these changes is also going to affect a good few other parts of |
Mark, following your suggestions about using bigger integer types, There is a macro HAVE_INT64 to control if there is a 64 bit type; The speed difference for small integers is small; |
Nice work! Seems like we're going to be able to look forward to faster Regarding the HAVE_INT64, there's a standard autoconf macro |
Nice work :) I'm changing the target versions to 2.7 and 3.1. The proposed changes |
It looks as though changing PyLong_SHIFT to 30 everywhere is much simpler
With this patch, all tests pass on my machine with the exception of Still to do:
|
I've opened a separate issue (bpo-4258) for the idea of using 30-bit Pernici Mario's idea applies even better to base 2**30 longs: one can |
See bpo-4258 for a patch that combines 30-bit digits with |
This patch comes from 30bit_longdigit13+optimizations1.patch in bpo-4258 |
Thanks! Unfortunately, it looks like I messed this up yesterday by |
Updated version of longobject_diff1:
|
I'm going to close this: it's a really nice idea, but after the 30-bit The only situation I've found where this optimization really does make a |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: