Issue 16606: hashlib memory leak

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/60810

classification

Title:	hashlib memory leak
Type:	resource usage	Stage:
Components:	Library (Lib)	Versions:	Python 3.2

process

Status:	closed	Resolution:	works for me
Dependencies:		Superseder:
Assigned To:		Nosy List:	Thorsten.Simons, ebfe, pitrou
Priority:	normal	Keywords:

Created on 2012-12-04 12:43 by Thorsten.Simons, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg176907 - (view)	Author: Thorsten Simons (Thorsten.Simons)	Date: 2012-12-04 12:43
hashlib seems to leak memory when used on a Linux box (whereas the same works fine when run under Windows 7) (tested w/ Python 3.2.1 and 3.2.3) <snip file "mem.py"> import hashlib #file = 'B:\\video\\TEST\\01_file_10G' file = '/video/TEST/01_file_10G' myhash = hashlib.sha256() with open(file, "rb") as f: for buffer in f: myhash.update(buffer) print('hash =', myhash.hexdigest()) <snip> On Windows, 'python3 mem.py' occupies roundabout 7 MB memory, on Linux (OpenSuse 12.2), it quickly acquires all available memory, then all swap, than get's killed.
msg176911 - (view)	Author: Thorsten Simons (Thorsten.Simons)	Date: 2012-12-04 13:05
forgot to say that this is about huge files (tested w/ a 10GB file)
msg176912 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-12-04 13:08
What happens if you replace iteration with something like: with open(file, "rb") as f: while True: data = f.read(16384) if not data: break myhash.update(data)
msg176913 - (view)	Author: Thorsten Simons (Thorsten.Simons)	Date: 2012-12-04 14:15
Antoine, this was of great help - no memory leaking anymore... So, I asume that somewhere in the iteration the read file is buffered? Does that make sense or - was it the developers intention? Thank you, Regards, Thorsten
msg176916 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2012-12-04 14:28
Well, it's not immediately obvious what the exact problem could be. Are you reading a regular text file? Or is it a binary file where maybe the '\n' character appears very rarely? If it can be reproduced with a smaller file, perhaps you can attach it somewhere.
msg176918 - (view)	Author: Lukas Lueg (ebfe)	Date: 2012-12-04 15:04
Thorsten, the problem is that you are using line-based syntax. The code 'for buffer in f:' will read one line per iteration and put it to 'buffer'; for a file opened in binary mode, the iterator will always seek to the next b'\n'. Depending on the content of the file, python may have to read tons of data before the next b'\n' appears.
msg176920 - (view)	Author: Thorsten Simons (Thorsten.Simons)	Date: 2012-12-04 15:19
OK, learned something again - should have known this :-( Thank you! Thorsten

History
Date	User	Action	Args
2022-04-11 14:57:39	admin	set	github: 60810
2012-12-04 17:46:30	eric.snow	set	status: open -> closed
2012-12-04 15:19:30	Thorsten.Simons	set	messages: + msg176920
2012-12-04 15:04:43	ebfe	set	nosy: + ebfe messages: + msg176918
2012-12-04 14:28:52	pitrou	set	messages: + msg176916
2012-12-04 14:15:02	Thorsten.Simons	set	resolution: works for me messages: + msg176913
2012-12-04 13:08:35	pitrou	set	nosy: + pitrou messages: + msg176912
2012-12-04 13:05:39	Thorsten.Simons	set	messages: + msg176911
2012-12-04 12:43:32	Thorsten.Simons	create