classification
Title: integer overflow in hashlib causes wrong results for cryptographic hash functions [was: mmap broken with large files on 64bit system]
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.0, Python 2.6
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: donut, loewis, schmir
Priority: critical Keywords: patch

Created on 2008-06-02 02:24 by donut, last changed 2008-09-18 12:06 by loewis. This issue is now closed.

Files
File name Uploaded Description Edit
testbigfile.py donut, 2008-06-02 02:24 test script
large_digest_update.diff schmir, 2008-07-14 22:16 patch against svn r64953
Messages (8)
msg67623 - (view) Author: Matthew Mueller (donut) Date: 2008-06-02 02:24
mmap on large files on 64 bit platforms in python >=2.5 returns some
sort of garbage.  In 2.4 it would just throw an exception.  Now I get
something like this (script runs md5.md5 on mmap object, and then runs
os.system md5sum for comparison):

This is python2.5 from Ubuntu 8.04 AMD64
/tmp$ python2.5 testbigfile.py 
python mmap md5: 1230552d39b7c1751f86bae5205ec0c8
abe59e28c9a3f11b883f62c80a3833a5 *bigfile


This is python svn as of 20080601, compiled the on same system.
/tmp$ python2.6 testbigfile.py
testbigfile.py:5: DeprecationWarning: the md5 module is deprecated; use
hashlib instead
  import md5
python mmap md5: 1230552d39b7c1751f86bae5205ec0c8
abe59e28c9a3f11b883f62c80a3833a5 *bigfile


Also note how the python md5 call returns immediately, not something you
would expect when md5ing 4GB of data.
msg67624 - (view) Author: Matthew Mueller (donut) Date: 2008-06-02 02:29
Actually, I just realized that this might be a problem with md5 module
instead.  Either way, something is busted.
msg67701 - (view) Author: Ralf Schmitt (schmir) Date: 2008-06-04 21:16
I tested this with python 2.6 and can confirm the issue.
The problem is that unsigned int isn't big enough to hold the size of
the objects, but the size is downcasted to an unsigned int at several
places in _hashopenssl.c. All of these occurences of Py_SAFE_DOWNCAST
seem problematic to me (Py_SAFE_DOWNCAST(len, Py_ssize_t, unsigned int))
msg67775 - (view) Author: Ralf Schmitt (schmir) Date: 2008-06-06 15:13
the same bug also occurs when computing the md5 of a string larger than
2**32
msg69642 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-07-14 05:18
So would anybody like to contribute a patch?
msg69664 - (view) Author: Ralf Schmitt (schmir) Date: 2008-07-14 22:16
this patch adds a digest_update function.
digest_update calls EVP_DigestUpdate(..) with chunks of 16 MB size and
also checks for signals.
I didn't write any tests (as they will most probably annoy many people
cause they would need much memory).

testbigfile.py however now works.
msg73373 - (view) Author: Ralf Schmitt (schmir) Date: 2008-09-18 11:52
same issue in http://bugs.python.org/issue3886.
it's sad that no one took a look at the patch...
now, it should probably be closed...
msg73375 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-09-18 12:06
Ok, closing. Thanks for the patch, anyway.
History
Date User Action Args
2008-09-18 12:06:26loewissetstatus: open -> closed
resolution: out of date
messages: + msg73375
2008-09-18 11:52:53schmirsetmessages: + msg73373
2008-08-05 22:28:45schmirsettitle: mmap broken with large files on 64bit system -> integer overflow in hashlib causes wrong results for cryptographic hash functions [was: mmap broken with large files on 64bit system]
2008-07-14 22:16:27schmirsetfiles: + large_digest_update.diff
keywords: + patch
messages: + msg69664
2008-07-14 05:18:31loewissetnosy: + loewis
messages: + msg69642
2008-06-12 06:00:56georg.brandlsetpriority: critical
versions: + Python 3.0
2008-06-06 15:13:27schmirsetmessages: + msg67775
2008-06-04 21:16:50schmirsetmessages: + msg67701
2008-06-04 20:43:00schmirsetnosy: + schmir
2008-06-02 02:29:53donutsetmessages: + msg67624
2008-06-02 02:24:57donutcreate