New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decimal and long hash, compatibly and efficiently #45306
Comments
Make hash of int/long periodic with period 2**32-1 (or 2**64-1 on 64-bit systems). This is makes it possible to define an efficient Decimal hash. Hash for int/long is already close to periodic with period 2**32-1; it's a small change to make it genuinely periodic. With this change, it's possible to write a Decimal __hash__ such that: (1) hash(Decimal(n))==hash(n) for all integers n, and The current implementation of Decimal hash is almost unusable for very large Decimals: hash(Decimal("1E999999999")) first converts its argument to a long, taking gigabytes of memory and a lot of time. The patch alters int_hash and long_hash to make them periodic, adds some tests to Lib/test/test_hash.py, and implements an efficient Decimal.__hash__. If there's any chance of the patch being accepted I'll write some additional tests in Lib/test/test_decimal.py to check hash compatibility between longs and Decimals. This patch fixes (most of) bug 1770416. |
IMHO this patch should be considered for (at least) py3k, in which long Mark, could you please outline the semantic changes (specially at Sorry to ask all that, but my C (and maths) is (are) *very* limited. I Maybe it's worth discussing this patch without linking its most 1 - http://mail.python.org/pipermail/python-3000/2007-September/010374.html |
To help explain what's going on, here's some Python code. The Python (1) long_hash1(n) == hash(n) for almost all integers n, with a very (2) long_hash1(n) has a simple closed form, making it possible to (3) The current long_hash in Objects/longobject.c can be changed, by if ((unsigned long)x < v->ob_digit[i]) x++; added directly after the addition. Explanation: the exceptions in (1) arise precisely when the addition x += v->ob_digit[i] in the long_hash code overflows (in the unsigned sense---equivalently, So what about long_hash2(n)? This is what the patched long_hash I don't really know whether the periodicity is that useful---the For what it's worth, I regard this patch as an ugly hack. I don't like
takes several minutes to complete. (Try it!) And I can't think of any LONG_BITS = 32
W = 2**LONG_BITS
HW = W // 2
ULONG_MAX = W - 1
def long_hash1(n):
if n == 0:
h = 0
else:
h = (1 + (abs(n)-1) % ULONG_MAX) * (1 if n > 0 else -1)
# reduce mod W to lie in the usual range for a (signed) C long
h = (h + HW) % W - HW
if h == -1:
h = -2
return int(h)
def long_hash2(n):
h = n % ULONG_MAX
h = (h + HW) % W - HW
if h == -1:
h = -2
return int(h) |
Thanks a lot for the explanation. I still think it could be valuable Regarding the performance of Decimal.__hash__: I believe (int) hash performance would be a moot issue if In [906]: D = int_decimal.Decimal In [923]: log(10**1000000) I'm not very competent in many skills needed for the job AND I'm The conversion is based on using divisions and multiplications to As I told Facundo in a private email (y otro llegará hoy o mañana), the Anyway, I'll try to tidy things up as much as I can and email Facundo |
Here's a revised patch, that only touches longobject.c and does the minimum
and the cmpl here is entirely redundant. Maybe there's some way of writing the C |
And here's the updated decimal.__hash__ patch that goes with the |
Applied the patchs long_hash.patch (rev 58208) and decimal_hash_v2.patch |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: