Message 20605 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients
Date	2004-05-18.02:14:02
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to

Content
Logged In: YES user_id=31435 Nothing wrong with being "better than random" -- it's not the purpose of Python's hash() to randomize, but to minimize hash collisions. That's why, e.g., hash(int) returns int unchanged. Then when indexing a dict by any contiguous range of ints (excepting -1), there are no collisions at all. That's a good thing. There's nothing in the structure of my example function geared toward the specific test instances used. It should do a good job of "spreading out" inputs that are "close together", and that's an important case, and the test inputs have a lot of that. It's a good thing that it avoids all collisions across them. About speed, the only answer is to time both, and across several major platforms. An extra mult may or may not cost more than blowing extra cache lines to suck up a table of ints. About the table of ints, it's always a dubious idea to multiply by an even number. Because the high bits of the product are thrown away, multiplying by an even number throws away information needlessly (it throws away as many useful bits as there are trailing zero bits in the multiplier). odd_number * 69069**i is always odd, so the multiplier in the table-free way is never even (or, worse, divisible by 4, 8, 16, etc).

Logged In: YES 
user_id=31435

Nothing wrong with being "better than random" -- it's not the 
purpose of Python's hash() to randomize, but to minimize 
hash collisions.  That's why, e.g., hash(int) returns int 
unchanged.  Then when indexing a dict by any contiguous 
range of ints (excepting -1), there are no collisions at all.  
That's a good thing.

There's nothing in the structure of my example function 
geared toward the specific test instances used.  It *should* 
do a good job of "spreading out" inputs that are "close 
together", and that's an important case, and the test inputs 
have a lot of that.  It's a good thing that it avoids all 
collisions across them.

About speed, the only answer is to time both, and across 
several major platforms.  An extra mult may or may not cost 
more than blowing extra cache lines to suck up a table of ints.

About the table of ints, it's always a dubious idea to multiply 
by an even number.  Because the high bits of the product are 
thrown away, multiplying by an even number throws away 
information needlessly (it throws away as many useful bits as 
there are trailing zero bits in the multiplier).  odd_number * 
69069**i is always odd, so the multiplier in the table-free 
way is never even (or, worse, divisible by 4, 8, 16, etc).

History
Date	User	Action	Args
2007-08-23 14:21:13	admin	link	issue942952 messages
2007-08-23 14:21:13	admin	create