Message 150589 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	Arfrever, PaulMcMillan, Zhiping.Deng, alex, barry, benjamin.peterson, christian.heimes, dmalcolm, georg.brandl, gvanrossum, jcea, pitrou, terry.reedy, vstinner
Date	2012-01-04.05:09:58
SpamBayes Score	2.0386182e-08
Marked as misclassified	No
Message-id	<1325653799.93.0.461049922261.issue13703@psf.upfronthosting.co.za>
In-reply-to

Content
Yet another random hash function, simplified version of Paul's function. It always use exactly 256 bits of entropy and so 32 bytes of memory, and doesn't keep the loop. I don't expect my function to be secure, but just give more work to the attacker to compute the data for an attack against our dict implementation. --- import os, array, struct sizeof_long = struct.calcsize("l") r_bits = 256 r_len = r_bits // (sizeof_long * 8) r_mask = r_len - 1 r = array.array('l', os.urandom(r_len * sizeof_long)) def randomhash(s): length = len(s) if length == 0: return -2 x = ord(s[0]) x ^= r[x & r_mask] x <<= 7 for ch in s: x = intmask(1000003 * x) x ^= ord(ch) x ^= length x ^= r[x & r_mask] return intmask(x) --- The first "x ^= r[x & r_mask]" may be replaced by "x ^= r[(x ^ length) & r_mask]". The binary shift is done after the first xor with r, because 27 and r_len are not coprime (r_len is a multipler of 27), and so (ord(s[0] << 7) & r_mask is always zero. randomhash(s)==hash(s) if we used twice the same index in the r array. I don't know if this case gives useful information.

Yet another random hash function, simplified version of Paul's function. It always use exactly 256 bits of entropy and so 32 bytes of memory, and doesn't keep the loop. I don't expect my function to be secure, but just give more work to the attacker to compute the data for an attack against our dict implementation.

---
import os, array, struct
sizeof_long = struct.calcsize("l")
r_bits = 256
r_len = r_bits // (sizeof_long * 8)
r_mask = r_len - 1
r = array.array('l', os.urandom(r_len * sizeof_long))

def randomhash(s):
    length = len(s)
    if length == 0:
        return -2
    x = ord(s[0])
    x ^= r[x & r_mask]
    x <<= 7
    for ch in s:
        x = intmask(1000003 * x)
        x ^= ord(ch)
    x ^= length
    x ^= r[x & r_mask]
    return intmask(x)
---

The first "x ^= r[x & r_mask]" may be replaced by "x ^= r[(x ^ length) & r_mask]".

The binary shift is done after the first xor with r, because 2**7 and r_len are not coprime (r_len is a multipler of 2**7), and so (ord(s[0] << 7) & r_mask is always zero.

randomhash(s)==hash(s) if we used twice the same index in the r array. I don't know if this case gives useful information.

History
Date	User	Action	Args
2012-01-04 05:10:00	vstinner	set	recipients: + vstinner, gvanrossum, barry, georg.brandl, terry.reedy, jcea, pitrou, christian.heimes, benjamin.peterson, Arfrever, alex, dmalcolm, Zhiping.Deng, PaulMcMillan
2012-01-04 05:09:59	vstinner	set	messageid: <1325653799.93.0.461049922261.issue13703@psf.upfronthosting.co.za>
2012-01-04 05:09:59	vstinner	link	issue13703 messages
2012-01-04 05:09:58	vstinner	create