Author vstinner
Recipients Arfrever, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, georg.brandl, gvanrossum, jcea, pitrou, terry.reedy, vstinner
Date 2012-01-04.05:09:58
SpamBayes Score 2.03862e-08
Marked as misclassified No
Message-id <>
Yet another random hash function, simplified version of Paul's function. It always use exactly 256 bits of entropy and so 32 bytes of memory, and doesn't keep the loop. I don't expect my function to be secure, but just give more work to the attacker to compute the data for an attack against our dict implementation.

import os, array, struct
sizeof_long = struct.calcsize("l")
r_bits = 256
r_len = r_bits // (sizeof_long * 8)
r_mask = r_len - 1
r = array.array('l', os.urandom(r_len * sizeof_long))

def randomhash(s):
    length = len(s)
    if length == 0:
        return -2
    x = ord(s[0])
    x ^= r[x & r_mask]
    x <<= 7
    for ch in s:
        x = intmask(1000003 * x)
        x ^= ord(ch)
    x ^= length
    x ^= r[x & r_mask]
    return intmask(x)

The first "x ^= r[x & r_mask]" may be replaced by "x ^= r[(x ^ length) & r_mask]".

The binary shift is done after the first xor with r, because 2**7 and r_len are not coprime (r_len is a multipler of 2**7), and so (ord(s[0] << 7) & r_mask is always zero.

randomhash(s)==hash(s) if we used twice the same index in the r array. I don't know if this case gives useful information.
Date User Action Args
2012-01-04 05:10:00vstinnersetrecipients: + vstinner, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, christian.heimes, benjamin.peterson, Arfrever, alex, dmalcolm, Zhiping.Deng, PaulMcMillan
2012-01-04 05:09:59vstinnersetmessageid: <>
2012-01-04 05:09:59vstinnerlinkissue13703 messages
2012-01-04 05:09:58vstinnercreate