This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: hashlib memory leak
Type: resource usage Stage:
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: Thorsten.Simons, ebfe, pitrou
Priority: normal Keywords:

Created on 2012-12-04 12:43 by Thorsten.Simons, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg176907 - (view) Author: Thorsten Simons (Thorsten.Simons) Date: 2012-12-04 12:43
hashlib seems to leak memory when used on a Linux box (whereas the same works fine when run under Windows 7) (tested w/ Python 3.2.1 and 3.2.3)

<snip file "mem.py">
import hashlib

#file = 'B:\\video\\TEST\\01_file_10G'
file = '/video/TEST/01_file_10G'

myhash = hashlib.sha256()

with open(file, "rb") as f:
    for buffer in f:
        myhash.update(buffer)

print('hash =', myhash.hexdigest())
<snip>

On Windows, 'python3 mem.py' occupies roundabout 7 MB memory,
on Linux (OpenSuse 12.2), it quickly acquires all available memory, then all swap, than get's killed.
msg176911 - (view) Author: Thorsten Simons (Thorsten.Simons) Date: 2012-12-04 13:05
forgot to say that this is about huge files (tested w/ a 10GB file)
msg176912 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-04 13:08
What happens if you replace iteration with something like:

with open(file, "rb") as f:
    while True:
        data = f.read(16384)
        if not data:
            break
        myhash.update(data)
msg176913 - (view) Author: Thorsten Simons (Thorsten.Simons) Date: 2012-12-04 14:15
Antoine,

this was of great help - no memory leaking anymore...
So, I asume that somewhere in the iteration the read file is buffered?
Does that make sense or - was it the developers intention?

Thank you,
Regards, Thorsten
msg176916 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-12-04 14:28
Well, it's not immediately obvious what the exact problem could be. Are you reading a regular text file? Or is it a binary file where maybe the '\n' character appears very rarely?

If it can be reproduced with a smaller file, perhaps you can attach it somewhere.
msg176918 - (view) Author: Lukas Lueg (ebfe) Date: 2012-12-04 15:04
Thorsten, the problem is that you are using line-based syntax. The code 'for buffer in f:' will read one line per iteration and put it to 'buffer'; for a file opened in binary mode, the iterator will always seek to the next b'\n'. Depending on the content of the file, python may have to read tons of data before the next b'\n' appears.
msg176920 - (view) Author: Thorsten Simons (Thorsten.Simons) Date: 2012-12-04 15:19
OK, learned something again - should have known this :-(

Thank you!

Thorsten
History
Date User Action Args
2022-04-11 14:57:39adminsetgithub: 60810
2012-12-04 17:46:30eric.snowsetstatus: open -> closed
2012-12-04 15:19:30Thorsten.Simonssetmessages: + msg176920
2012-12-04 15:04:43ebfesetnosy: + ebfe
messages: + msg176918
2012-12-04 14:28:52pitrousetmessages: + msg176916
2012-12-04 14:15:02Thorsten.Simonssetresolution: works for me
messages: + msg176913
2012-12-04 13:08:35pitrousetnosy: + pitrou
messages: + msg176912
2012-12-04 13:05:39Thorsten.Simonssetmessages: + msg176911
2012-12-04 12:43:32Thorsten.Simonscreate