Author ebfe
Recipients ebfe
Date 2008-12-26.13:39:07
SpamBayes Score 9.07727e-11
Marked as misclassified No
Message-id <1230298756.92.0.632482701363.issue4751@psf.upfronthosting.co.za>
In-reply-to
Content
The hashlib functions provided by _hashopenssl.c hold the GIL all the
time although the underlying openssl-library is basically thread-safe.
I've attached a patch (svn diff) which basically does four things:

* If python is compiled with thread-support, the EVPobject is extended
by an additional PyThread_type_lock which protects the objects individually.
* The 'update' function releases the GIL if the to-be-hashed object is a
Bytes-object and therefor provides trustworthy locking (all other types,
including subclasses, are not trustworthy!). This allows multiple
threads to do hashing in parallel.
* The EVP_hash function removes duplicated code.
* The situation regarding unicode objects is now more meaningful. Upon
passing a unicode-string to the .update() function, the original hashlib
throws a "TypeError: object supporting the buffer API required" which is
confusing. I think it's perfectly valid not to accept unicode-strings as
input and people should required to call str.encode() upon their strings
before hashing, so a well-defined byte-representation of their strings
get hashed. Therefor I patched the MY_GET_BUFFER_VIEW_OR_ERROUT-macro to
throw "TypeError: Unicode-objects must be encoded before hashing". This
also fixes issue #1118


I've tested this patch and did not run into problems. CPU occupancy
relies on the buffer-size passed to .update() as releasing the GIL is
basically not worth the effort for very small buffers. More testing may
be needed...
History
Date User Action Args
2008-12-26 13:39:17ebfesetrecipients: + ebfe
2008-12-26 13:39:16ebfesetmessageid: <1230298756.92.0.632482701363.issue4751@psf.upfronthosting.co.za>
2008-12-26 13:39:10ebfelinkissue4751 messages
2008-12-26 13:39:08ebfecreate