This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Eric Appelt, berker.peksag, christian.heimes, martin.panter, python-dev, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date 2018-01-15.10:09:04
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1516010945.34.0.467229070634.issue26163@psf.upfronthosting.co.za>
In-reply-to
Content
As far as I understand, the problem is that the value obtained by XORing hashes of elements has nonuniform distribution. Further transformations, either linear or nonlinear, as well as adding length, don't change the fact that the result of XORing contains less information than the number of bit in the hash word. The mathematically strong way of computing the hash of a frozenset is:

    hash(tuple(sorted(hash(x) for x in frozenset)))

Store all hash values into an array, sort them for getting rid of ordering, and make a hash of all concatenated hashes.

But this algorithm requires linear memory and have superlinear complexity. For practical purposes we need just makes the difference of the distribution from the uniform distribution small enough. Maybe the following algorithm could help:

    buckets = [0] * N
    for x in frozenset:
        h = hash(x)
        i = shuffle_bits1(h) % N
        buckets[i] ^= shuffle_bits2(h)
    return hash(tuple(buckets))

where shuffle_bits1() and shuffle_bits2() are functions that shuffle bits in different ways.
History
Date User Action Args
2018-01-15 10:09:05serhiy.storchakasetrecipients: + serhiy.storchaka, tim.peters, rhettinger, vstinner, christian.heimes, python-dev, berker.peksag, martin.panter, Eric Appelt
2018-01-15 10:09:05serhiy.storchakasetmessageid: <1516010945.34.0.467229070634.issue26163@psf.upfronthosting.co.za>
2018-01-15 10:09:05serhiy.storchakalinkissue26163 messages
2018-01-15 10:09:05serhiy.storchakacreate