This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: Update comments in dictobject.c
Type: Stage: resolved
Components: Documentation Versions: Python 3.6, Python 3.4, Python 3.5
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: SilentGhost, christian.heimes, jaysinh.shukla, r.david.murray, rhettinger, terry.reedy
Priority: low Keywords: easy, patch

Created on 2014-02-18 17:33 by rhettinger, last changed 2022-04-11 14:57 by admin. This issue is now closed.

File name Uploaded Description Edit
issue20674_patch_v1.diff jaysinh.shukla, 2016-06-22 11:40 review
issue20674_patch_v3.diff jaysinh.shukla, 2016-06-25 06:36 review
issue20674_patch_v5.diff jaysinh.shukla, 2016-06-25 11:46 review
Messages (8)
msg211525 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-02-18 17:33
The hash function comments in Objects/dictobject.c no longer match the implementation:

Major subtleties ahead:  Most hash schemes depend on having a "good" hash                                                         
function, in the sense of simulating randomness.  Python doesn't:  its most                                                       
important hash functions (for strings and ints) are very regular in common                                                        
  >>> map(hash, (0, 1, 2, 3))                                                                                                     
  [0, 1, 2, 3]                                                                                                                    
  >>> map(hash, ("namea", "nameb", "namec", "named"))                                                                             
  [-1658398457, -1658398460, -1658398459, -1658398462]                                                                            
This isn't necessarily bad!  To the contrary, in a table of size 2**i, taking                                                     
the low-order i bits as the initial table index is extremely fast, and there                                                      
are no collisions at all for dicts indexed by a contiguous range of ints.                                                         
The same is approximately true when keys are "consecutive" strings.  So this                                                      
gives better-than-random behavior in common cases, and that's very desirable. 
msg268402 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2016-06-12 21:11
Christian, I assigned to you because I thought you had done the SIPhash work that invalidated these comments.   Do you know who the right person is?
msg269070 - (view) Author: Jaysinh shukla (jaysinh.shukla) * Date: 2016-06-22 11:40
Submitting patch which tries to cover mentioned change. I request Core-developers to verify the approach I took for fixing this change. Thanks!
msg269132 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-06-23 18:36
Completely remove all reference to strings as it is normally completely wrong.  In three runs of
   list(map(hash, ("namea", "nameb", "namec", "named")))
I get
[-7801965690653662103, -712276634514874737, 4394438508544812081, -6975951345912708912]
[-1171799829, 419385606, 1995488749, 1433346153]
[551304887, -285234584, -553876274, -279423320]

Rewrite for ints only.

I am willing to apply, but cannot patch 3.4, so Raymond, if you want that, you will have to do it.
msg269221 - (view) Author: Jaysinh shukla (jaysinh.shukla) * Date: 2016-06-25 06:36
Improving last with with following points:
1. Removing string related examples
2. Updating int example with more readable way (According to me).

Reviewer review the patch file named "issue20674_patch_v3.diff"
msg269222 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-06-25 06:42
I've left comments on rietveld. Not sure if everyone's seeing them.
msg269231 - (view) Author: Jaysinh shukla (jaysinh.shukla) * Date: 2016-06-25 11:46
Agreeing with "SilentGhost". Demonstrating `hash()` example for type int using list comprehension. Review  issue20674_patch_v5.diff Thanks!
msg270103 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-07-10 16:50
Oops, wrong number in the commit message.  Thanks for the patch, Jaysinh. 

New changeset 74109d87283f by R David Murray in branch '3.5':
#20647: Update dictobject.c comments to account for randomized string hashes.

New changeset 31cdf23da19d by R David Murray in branch 'default':
Merge: #20647: Update dictobject.c comments to account for randomized string hashes.
Date User Action Args
2022-04-11 14:57:58adminsetgithub: 64873
2016-07-10 16:52:07r.david.murraysetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2016-07-10 16:50:08r.david.murraysetnosy: + r.david.murray
messages: + msg270103
2016-06-25 11:46:09jaysinh.shuklasetfiles: + issue20674_patch_v5.diff

messages: + msg269231
2016-06-25 06:42:32SilentGhostsetnosy: + SilentGhost

messages: + msg269222
stage: needs patch -> patch review
2016-06-25 06:36:11jaysinh.shuklasetfiles: + issue20674_patch_v3.diff

messages: + msg269221
2016-06-23 18:36:35terry.reedysetnosy: + terry.reedy
messages: + msg269132
2016-06-22 11:40:20jaysinh.shuklasetfiles: + issue20674_patch_v1.diff

nosy: + jaysinh.shukla
messages: + msg269070

keywords: + patch
2016-06-12 21:11:21rhettingersetmessages: + msg268402
2016-06-12 11:24:50christian.heimessetassignee: christian.heimes ->
2015-07-31 03:15:47zach.waresetkeywords: + easy
stage: needs patch
components: + Documentation
versions: + Python 3.5, Python 3.6
2014-02-18 17:33:15rhettingercreate