classification
Title: hashlib object cannot be pickled
Type: enhancement Stage: resolved
Components: Library (Lib) Versions: Python 3.3
process
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: Andrey.Kislyuk, approximately, gregory.p.smith, pitrou, rhettinger, vstinner
Priority: normal Keywords:

Created on 2011-04-05 11:49 by vstinner, last changed 2017-07-14 12:14 by Andrey.Kislyuk. This issue is now closed.

Messages (10)
msg133021 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-05 11:49
$ ./python 
Python 3.3a0 (default:76ed6a061ebe, Apr  5 2011, 12:25:00) 
>>> import hashlib, pickle
>>> hash=hashlib.new('md5')
>>> pickle.dumps(hash)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_pickle.PicklingError: Can't pickle <class '_hashlib.HASH'>: attribute lookup _hashlib.HASH failed

The problem is that _hashlib.HASH is not accessible at Python level. There is a C define to make it accessible, but it is disabled by default: "#if HASH_OBJ_CONSTRUCTOR". This test is as old as the _hashlib module (#1121611, 624918e1c1b2).
msg133022 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-04-05 12:13
Oh, I don't know if it is possible to serialize a OpenSSL hash object (EVP_MD_CTX)...
msg133030 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-04-05 14:02
Why on Earth would you want to serialize a hashlib object?
It makes as much sense as serializing, say, a JSONEncoder.
msg133192 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2011-04-07 06:08
heh yeah.  while all hash functions do have internal state and someone
could conceivably want to store such a state (it basically amounts to
queued up partial block of input data if any and the current starting
IV) there are not consistent APIs to expose that and I really don't
see why it'd be worth trying to find them.

remember, hashlib doesn't have to be openssl.  there are non openssl
libtomcrypt based versions and someone nice should write a libnss
based version someday.

i'd mark this "won't fix." :)

-Greg

On Tue, Apr 5, 2011 at 7:02 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
> Why on Earth would you want to serialize a hashlib object?
> It makes as much sense as serializing, say, a JSONEncoder.
>
> ----------
> nosy: +gregory.p.smith, pitrou
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue11771>
> _______________________________________
>
msg133193 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-04-07 06:17
I also recommend closing this one.
msg222036 - (view) Author: Klaus Wolf (approximately) Date: 2014-07-01 14:31
Please reopen this bug. To answer the question: "Why on Earth would you want to serialize a hashlib object?" : multiprocessing.connection.ForkingPickler wants. I.e. if you want to parallelize your hash calculations, this will obstruct your efforts.
msg222042 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2014-07-01 15:04
Do you honestly have a situation where you need to share a computationally
significant amount of hashing state only to want to finish the computation
N different times with alternate computationally significant ending data
that multiprocessing would actually help with where you cannot use
threads?  Hashlib releases the GIL during nontrivial hash computations.
msg222044 - (view) Author: Klaus Wolf (approximately) Date: 2014-07-01 15:10
You want to say: It doesn't work, but it is somehow intentional because you never used id, correct?
msg222050 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2014-07-01 17:29
Please be constructive.

There is no way to implement generic pickling for hash objects that would work across all implementations.  

The underlying code implementing each function is free to store its internal state however it wants and does not provide an API to get at it or any standard representation of it.

Sure, you could hack things up and allow a specific version and build of openssl's EVP hashes to dump their state and restore it for use in another process running that same specific version and build of openssl (as would likely be the case for multiprocessing use) just as you could for any other implementation of a hash function such as the builtin libtomcrypt versions.  But this is not portable between compilations using different implementations of the hash algorithm.  That is not what someone using pickle would ever expect.

Public APIs to access the internal state of hash functions do not exist because it is not a common thing for people to do.

hashlib isn't going to support this unless someone contributes a very solid patch with tests that handles all of the compatibility issues in a friendly maintainable manner.
msg298350 - (view) Author: Andrey Kislyuk (Andrey.Kislyuk) * Date: 2017-07-14 12:14
For anyone else looking for a solution to this, I wrote a library: https://github.com/kislyuk/rehash
History
Date User Action Args
2017-07-14 12:14:45Andrey.Kislyuksetnosy: + Andrey.Kislyuk
messages: + msg298350
2014-07-01 17:29:48gregory.p.smithsettype: enhancement
messages: + msg222050
stage: resolved
2014-07-01 15:10:40approximatelysetmessages: + msg222044
2014-07-01 15:04:06gregory.p.smithsetmessages: + msg222042
2014-07-01 14:31:59approximatelysetnosy: + approximately
messages: + msg222036
2011-04-07 08:17:54vstinnersetstatus: open -> closed
resolution: wont fix
2011-04-07 06:17:47rhettingersetnosy: + rhettinger
messages: + msg133193
2011-04-07 06:08:22gregory.p.smithsetmessages: + msg133192
2011-04-05 14:02:07pitrousetnosy: + gregory.p.smith, pitrou
messages: + msg133030
2011-04-05 12:13:41vstinnersetmessages: + msg133022
2011-04-05 11:49:41vstinnercreate