Author tim.peters
Recipients
Date 2004-05-18.02:14:02
SpamBayes Score
Marked as misclassified
Message-id
In-reply-to
Content
Logged In: YES 
user_id=31435

Nothing wrong with being "better than random" -- it's not the 
purpose of Python's hash() to randomize, but to minimize 
hash collisions.  That's why, e.g., hash(int) returns int 
unchanged.  Then when indexing a dict by any contiguous 
range of ints (excepting -1), there are no collisions at all.  
That's a good thing.

There's nothing in the structure of my example function 
geared toward the specific test instances used.  It *should* 
do a good job of "spreading out" inputs that are "close 
together", and that's an important case, and the test inputs 
have a lot of that.  It's a good thing that it avoids all 
collisions across them.

About speed, the only answer is to time both, and across 
several major platforms.  An extra mult may or may not cost 
more than blowing extra cache lines to suck up a table of ints.

About the table of ints, it's always a dubious idea to multiply 
by an even number.  Because the high bits of the product are 
thrown away, multiplying by an even number throws away 
information needlessly (it throws away as many useful bits as 
there are trailing zero bits in the multiplier).  odd_number * 
69069**i is always odd, so the multiplier in the table-free 
way is never even (or, worse, divisible by 4, 8, 16, etc).
History
Date User Action Args
2007-08-23 14:21:13adminlinkissue942952 messages
2007-08-23 14:21:13admincreate