Message 327311 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tim.peters
Recipients	eric.smith, jdemeyer, mark.dickinson, rhettinger, sir-sigurd, tim.peters
Date	2018-10-07.21:13:39
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1538946819.58.0.545547206417.issue34751@psf.upfronthosting.co.za>
In-reply-to

Content
Attaching htest.py so we have a common way to compare what various things do on the tests that have been suggested. unittest sucks for that. doctest too. Here's current code output from a 32-bit build; "ideally" we want "got" values not much larger than "mean" (these are counting collisions): range(100) by 3; 32-bit hash codes; mean 116.42 got 0 -10 .. 8 by 4; 32-bit hash codes; mean 1.28 got 69,596 -50 .. 50 less -1 by 3; 32-bit hash codes; mean 116.42 got 708,066 0..99 << 60 by 3; 32-bit hash codes; mean 116.42 got 875,000 [-3, 3] by 20; 32-bit hash codes; mean 128.00 got 1,047,552 [0.5, 0.25] by 20; 32-bit hash codes; mean 128.00 got 1,048,568 old tuple test; 32-bit hash codes; mean 7.43 got 6 new tuple test; 32-bit hash codes; mean 13.87 got 102,922 And under a 64-bit build, where the full hash code is considered, and also its lowest and highest 32 bits. Note, e.g., that the old tuple test is an utter disaster if we only look at the high 32 bits. Which is actually fine by me - the point of this is to show what happens. Judging is a different (albeit related) issue ;-) range(100) by 3; 64-bit hash codes; mean 0.00 got 0 range(100) by 3; 32-bit lower hash codes; mean 116.42 got 0 range(100) by 3; 32-bit upper hash codes; mean 116.42 got 989,670 -10 .. 8 by 4; 64-bit hash codes; mean 0.00 got 69,596 -10 .. 8 by 4; 32-bit lower hash codes; mean 1.28 got 69,596 -10 .. 8 by 4; 32-bit upper hash codes; mean 1.28 got 101,438 -50 .. 50 less -1 by 3; 64-bit hash codes; mean 0.00 got 708,066 -50 .. 50 less -1 by 3; 32-bit lower hash codes; mean 116.42 got 708,066 -50 .. 50 less -1 by 3; 32-bit upper hash codes; mean 116.42 got 994,287 0..99 << 60 by 3; 64-bit hash codes; mean 0.00 got 500,000 0..99 << 60 by 3; 32-bit lower hash codes; mean 116.42 got 875,000 0..99 << 60 by 3; 32-bit upper hash codes; mean 116.42 got 989,824 [-3, 3] by 20; 64-bit hash codes; mean 0.00 got 1,047,552 [-3, 3] by 20; 32-bit lower hash codes; mean 128.00 got 1,047,552 [-3, 3] by 20; 32-bit upper hash codes; mean 128.00 got 1,047,552 [0.5, 0.25] by 20; 64-bit hash codes; mean 0.00 got 1,048,544 [0.5, 0.25] by 20; 32-bit lower hash codes; mean 128.00 got 1,048,575 [0.5, 0.25] by 20; 32-bit upper hash codes; mean 128.00 got 1,048,544 old tuple test; 64-bit hash codes; mean 0.00 got 0 old tuple test; 32-bit lower hash codes; mean 7.43 got 6 old tuple test; 32-bit upper hash codes; mean 7.43 got 128,494 new tuple test; 64-bit hash codes; mean 0.00 got 102,920 new tuple test; 32-bit lower hash codes; mean 13.87 got 102,922 new tuple test; 32-bit upper hash codes; mean 13.87 got 178,211

Attaching htest.py so we have a common way to compare what various things do on the tests that have been suggested.  unittest sucks for that.  doctest too.  Here's current code output from a 32-bit build; "ideally" we want "got" values not much larger than "mean" (these are counting collisions):

range(100) by 3; 32-bit hash codes; mean 116.42 got 0
-10 .. 8 by 4; 32-bit hash codes; mean 1.28 got 69,596
-50 .. 50 less -1 by 3; 32-bit hash codes; mean 116.42 got 708,066
0..99 << 60 by 3; 32-bit hash codes; mean 116.42 got 875,000
[-3, 3] by 20; 32-bit hash codes; mean 128.00 got 1,047,552
[0.5, 0.25] by 20; 32-bit hash codes; mean 128.00 got 1,048,568
old tuple test; 32-bit hash codes; mean 7.43 got 6
new tuple test; 32-bit hash codes; mean 13.87 got 102,922

And under a 64-bit build, where the full hash code is considered, and also its lowest and highest 32 bits.  Note, e.g., that the old tuple test is an utter disaster if we only look at the high 32 bits.  Which is actually fine by me - the point of this is to show what happens.  Judging is a different (albeit related) issue ;-)

range(100) by 3; 64-bit hash codes; mean 0.00 got 0
range(100) by 3; 32-bit lower hash codes; mean 116.42 got 0
range(100) by 3; 32-bit upper hash codes; mean 116.42 got 989,670
-10 .. 8 by 4; 64-bit hash codes; mean 0.00 got 69,596
-10 .. 8 by 4; 32-bit lower hash codes; mean 1.28 got 69,596
-10 .. 8 by 4; 32-bit upper hash codes; mean 1.28 got 101,438
-50 .. 50 less -1 by 3; 64-bit hash codes; mean 0.00 got 708,066
-50 .. 50 less -1 by 3; 32-bit lower hash codes; mean 116.42 got 708,066
-50 .. 50 less -1 by 3; 32-bit upper hash codes; mean 116.42 got 994,287
0..99 << 60 by 3; 64-bit hash codes; mean 0.00 got 500,000
0..99 << 60 by 3; 32-bit lower hash codes; mean 116.42 got 875,000
0..99 << 60 by 3; 32-bit upper hash codes; mean 116.42 got 989,824
[-3, 3] by 20; 64-bit hash codes; mean 0.00 got 1,047,552
[-3, 3] by 20; 32-bit lower hash codes; mean 128.00 got 1,047,552
[-3, 3] by 20; 32-bit upper hash codes; mean 128.00 got 1,047,552
[0.5, 0.25] by 20; 64-bit hash codes; mean 0.00 got 1,048,544
[0.5, 0.25] by 20; 32-bit lower hash codes; mean 128.00 got 1,048,575
[0.5, 0.25] by 20; 32-bit upper hash codes; mean 128.00 got 1,048,544
old tuple test; 64-bit hash codes; mean 0.00 got 0
old tuple test; 32-bit lower hash codes; mean 7.43 got 6
old tuple test; 32-bit upper hash codes; mean 7.43 got 128,494
new tuple test; 64-bit hash codes; mean 0.00 got 102,920
new tuple test; 32-bit lower hash codes; mean 13.87 got 102,922
new tuple test; 32-bit upper hash codes; mean 13.87 got 178,211

History
Date	User	Action	Args
2018-10-07 21:13:39	tim.peters	set	recipients: + tim.peters, rhettinger, mark.dickinson, eric.smith, jdemeyer, sir-sigurd
2018-10-07 21:13:39	tim.peters	set	messageid: <1538946819.58.0.545547206417.issue34751@psf.upfronthosting.co.za>
2018-10-07 21:13:39	tim.peters	link	issue34751 messages
2018-10-07 21:13:39	tim.peters	create