Message 326883 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	jdemeyer
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-10-02.14:42:41
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538491361.95.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
> 100% pure SeaHash does x ^= t at the start first, instead of `t ^ (t << 1)` on the RHS. Indeed. Some initial testing shows that this kind of "input mangling" (applying such a permutation on the inputs) actually plays a much more important role to avoid collisions than the SeaHash operation x ^= ((x >> 16) >> (x >> 29)). So my suggestion remains for y in INPUT: t = hash(y) t ^= t * SOME_LARGE_EVEN_NUMBER h ^= t h = MULTIPLIER Adding in the additional SeaHash operations x ^= ((x >> 16) >> (x >> 29)) x = MULTIPLIER does not increase the probability of the tests passing.

> 100% pure SeaHash does x ^= t at the start first, instead of `t ^ (t << 1)` on the RHS.

Indeed. Some initial testing shows that this kind of "input mangling" (applying such a permutation on the inputs) actually plays a much more important role to avoid collisions than the SeaHash operation x ^= ((x >> 16) >> (x >> 29)).

So my suggestion remains

for y in INPUT:
    t = hash(y)
    t ^= t * SOME_LARGE_EVEN_NUMBER
    h ^= t
    h *= MULTIPLIER

Adding in the additional SeaHash operations

    x ^= ((x >> 16) >> (x >> 29))
    x *= MULTIPLIER

does not increase the probability of the tests passing.

History
Date	User	Action	Args
2018-10-02 14:42:41	jdemeyer	set	recipients: + jdemeyer, tim.peters, rhettinger, mark.dickinson, eric.smith, sir-sigurd
2018-10-02 14:42:41	jdemeyer	set	messageid: <1538491361.95.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-10-02 14:42:41	jdemeyer	link	issue34751 messages
2018-10-02 14:42:41	jdemeyer	create