This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mark.dickinson
Recipients ajaksu2, facundobatista, mark.dickinson
Date 2007-09-15.04:06:39
SpamBayes Score 0.0005784205
Marked as misclassified No
Message-id <1189829202.72.0.980292283083.issue1772851@psf.upfronthosting.co.za>
In-reply-to
Content
Here's a revised patch, that only touches longobject.c and does the minimum 
necessary to alter long.__hash__ from being *almost* completely predictable to 
being completely predictable.  To summarize the effects of this patch:

 - the current long.__hash__ is predictable except for a very small number of 
(unpredictable) inputs;  this patch makes it predictable for all inputs, thereby 
making it possible to define an efficient Decimal.__hash__.  To be precise, the 
patched long.__hash__ has the property that it's periodic of period ULONG_MAX in 
the ranges [1, infinity) and (-infinity, -1].

 - the value of hash(n) is unchanged for any n that's small enough to be 
representable as an int, and also unchanged for the vast majority of long 
integers n of reasonable size.  (For integers up to 450 digits long, the new hash 
value differs from the old for around 1 in every 2500 integers on a 32-bit 
system;  on a 64-bit system the figure is 1 in every 10000 billion.)

 - On my test system (Dual Xeon 2.8GHz/SuSE Linux 10.2/gcc 4.1), using
timeit.timeit('hash(n)') there's no measurable slowdown for small integers.  
Unfortunately there *is* slowdown for larger integers:  around 20% slowdown for 
100 digit integers and around 70% extra time for 1000 digit integers, though in 
all cases the times are in fractions of a microsecond.  It doesn't help that gcc 
doesn't optimize the inner loop perfectly:  with the -O3 flag, the addition and 
subsequent correction are compiled to (with x in eax and v->ob_digit[i] in edx):

        addl    %eax, %edx
        cmpl    %eax, %edx
        adcl    $0, %edx

and the cmpl here is entirely redundant.  Maybe there's some way of writing the C 
code that makes it easier for gcc to pick up on this optimization?
Files
File name Uploaded
long_hash.patch mark.dickinson, 2007-09-15.04:06:40
History
Date User Action Args
2007-09-15 04:06:43mark.dickinsonsetspambayes_score: 0.000578421 -> 0.0005784205
recipients: + mark.dickinson, facundobatista, ajaksu2
2007-09-15 04:06:42mark.dickinsonsetspambayes_score: 0.000578421 -> 0.000578421
messageid: <1189829202.72.0.980292283083.issue1772851@psf.upfronthosting.co.za>
2007-09-15 04:06:42mark.dickinsonlinkissue1772851 messages
2007-09-15 04:06:41mark.dickinsoncreate