This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: serious string hashing error in 2.4b1
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: rhettinger Nosy List: arigo, rhettinger, rthalley, tim.peters
Priority: critical Keywords:

Created on 2004-10-25 22:32 by rthalley, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
foo.py rthalley, 2004-10-25 22:32
Messages (8)
msg22857 - (view) Author: Bob Halley (rthalley) Date: 2004-10-25 22:32
There is a serious hashing error in 2.4b1.  I don't know if the 
error is confined to 64-bit systems, or is a general problem. 
 
The program attached to this report produces, as expected, 
this output when run with python 2.3.3: 
 
Python: 20303f0 
System: Linux localhost 2.6.8-1.521 #1 Mon Aug 16 09:01:00 
EDT 2004 x86 
_64 x86_64 x86_64 GNU/Linux 
 
hash("DNSSEC") == -7602892900506903802 
 
hash("D") == 8704026181 
hash("DN") == 8704052292078464 
hash("DNS") == -2784798555566127274 
hash("DNSS") == 5809125768486327656 
hash("DNSSE") == 5232635463381381892 
hash("DNSSEC") == -7602892900506903802 
 
When run with 2.4b1, I get the following output: 
 
Python: 20400b1 
System: Linux localhost 2.6.8-1.521 #1 Mon Aug 16 09:01:00 
EDT 2004 x86 
_64 x86_64 x86_64 GNU/Linux 
 
hash("DNSSEC") == -7602892900506903802 
 
hash("D") == 8704026181 
hash("DN") == 8704052292078464 
hash("DNS") == 8704052292078464 
hash("DNSS") == 8704052292078464 
hash("DNSSE") == 8704052292078464 
hash("DNSSEC") == 8704052292078464 
Traceback (most recent call last): 
  File "foo.py", line 22, in ? 
    assert hb == ha, 'hashes do not match!' 
AssertionError: hashes do not match! 
 
The way I discovered this was that dnspython's regression 
suite started failing because the string 'DNSSEC' constructed 
character-by-character was not being found in a dictionary 
which had a 'DNSSEC' key. 
 
I have not yet found the underlying bug; I was focussing on 
getting the info needed to demonstrate the bug first, since 
it's so serious.  If I make any progress fixing it, I'll send a 
patch. 
 
/Bob 
 
msg22858 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-10-25 23:08
Logged In: YES 
user_id=31435

Boosted to highest priority.  I don't get exactly the same 
numbers on my 32-bit box, but the problem exists there too.  
Suspect it has to do with new optimizations for string "+=".
msg22859 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-10-25 23:15
Logged In: YES 
user_id=31435

FYI, I think ceval.c's string_concatenate() needs to reset 
ob_shash to -1 after it extends a string in-place.  Alas, I 
can't make time to test that now.
msg22860 - (view) Author: Bob Halley (rthalley) Date: 2004-10-25 23:26
Logged In: YES 
user_id=671513

Patch #1054197 fixes this. 
 
The problem was that ceval.c wasn't invalidating the cached value. 
 
msg22861 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-10-26 01:53
Logged In: YES 
user_id=80475

Fixed.
See:
   Objects/stringobject.c 2.226
   Lib/test/string_tests.py 1.42

Thanks for the highly specific, timely bug report.
msg22862 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2004-10-26 13:26
Logged In: YES 
user_id=4771

Mea culpa.  Not only premature optimizations but even the ones that are apparently reasonably small-reaching are the root of... etc.
msg22863 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-10-26 23:42
Logged In: YES 
user_id=80475

IMO, it was my fault.  I reviewed, tested and applied the patch.

Also, I don't think optimization evils have much to do with
it.  It was an oversight that could affect any patch. 
Fortunately, a good beta process helped catch it quickly.

There may even be a silver lining.  By invalidating the
cache in the resize call, we may have fixed other possibly
broken code and will have prevented other errors in the future.
msg22864 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-10-27 00:51
Logged In: YES 
user_id=31435

Na, it's optimization, all right -- there's no conceivable reason 
other than optimization to try to mutate an immutable object, 
esp. one that's already been visible to user code.

But that's OK.  The more optimizations we do, the more 
optimization bugs we'll get.  We'd be the only language 
implementation project in the history of the universe to avoid 
that otherwise.  Don't be so touchy <wink>.

I'll note that the string object invariants aren't clearly written 
down in one place.  One use for making invariants explicit is 
as a sanity checklist when doing optimizations.
History
Date User Action Args
2022-04-11 14:56:07adminsetgithub: 41079
2004-10-25 22:32:06rthalleycreate