classification
Title: Bug in hashlib
Type: behavior Stage: patch review
Components: Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: gregory.p.smith Nosy List: Eloff, amaury.forgeotdarc, gregory.p.smith, pitrou
Priority: normal Keywords:

Created on 2009-06-13 22:45 by Eloff, last changed 2009-09-14 18:25 by Eloff. This issue is now closed.

Files
File name Uploaded Description Edit
hashlib.py Eloff, 2009-06-14 16:51
Messages (7)
msg89338 - (view) Author: Daniel Eloff (Eloff) Date: 2009-06-13 22:45
The try statement at the end of hashlib.py is some of the worst python
code I've had the mispleasure of reading for a long time.

Secondly, it seems flawed in function as well as form.
__get_builtin_constructor can throw an ImportError, which seems
erroneously caught in the except (why on earth does the except span more
than the "import _hashlib"? That's bad style for precisely this reason.)
This will cause an error in the import of hashlib when there are
possibly many working hash functions already imported. Changing the:

try:
    exec funcName + ' = __get_builtin_constructor(funcName)'
except ValueError:
    pass

to catch ImportError as well solves the problem for me.
msg89344 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-06-14 09:27
Can you propose an actual test case and a patch?
msg89355 - (view) Author: Daniel Eloff (Eloff) Date: 2009-06-14 16:51
If you're missing any hash algorithm from openssl (say sha224) and the
corresponding _sha224 module, hashlib will fail to import and no hash
algorithms will be available.

Additionally, if no (or only some) openssl algorithms are available, the
error branch expects every hash algorithm to be present in it's own
module, if even one is missing, hashlib will fail to import.

Producing a test case for these without mock objects is difficult at
best. I leave that to the python team, if they desire to do so.

Here is a rewritten hashlib.py, without that ugly mess of code and
without the above two bugs. It passes all tests.
msg89374 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-06-14 22:38
In addition, I would replace
    exec funcName + ' = __get_hash(funcName)'
with
    globals()[funcName] = __get_hash(funcName)
msg89375 - (view) Author: Daniel Eloff (Eloff) Date: 2009-06-14 23:00
Yes, I prefer:

globals()[funcName] = __get_hash(funcName)

The exec was used in the original code, I'm not aware of the style of
the python stdlib programmers, so I tried to be as true to what I found
as possible.

I just wanted to blunt my words in the original post, I don't want
Gregory Smith to view it as a personal attack, I was just frustrated in
having to debug the hashlib code. I'm sure even Gregory would admit it's
not code he's proud of (probably done under a tight deadline.) No doubt
Gregory normally does high quality work.
msg91646 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2009-08-16 22:01
That code was indeed a mess.  I've incorporated most suggestions from
your cleaned up version (and fixed a bug in it) in trunk r74479.

Have you ever seen __get_builtin_constructor fail in practice?  I can
imagine that packing up a stripped down python interpreter with only a
subset of the hash extension modules would be the only time this would
likely be encountered.

I've changed your code to log an error and continue rather than silently
continue when an expected hash function is not available.  That error
log may be overkill, perhaps we should just leave catching this problem
up to the unittests.
msg92626 - (view) Author: Daniel Eloff (Eloff) Date: 2009-09-14 18:25
I have indeed seen __get_builtin_constructor fail in practice, in the 
python build in the Ubuntu repository if IIRC. I seem to recall the 
problem was that python was using a version of openssl that didn't 
include all of hash algorithms, probably because a python library or the 
process hosting python also had a dependency on libssl and had it loaded 
already.

But there is also the issue that Python is a language with many 
implementations now, and in fact, it's unlikely that CPython will remain 
the defacto implementation forever. Having a more robust hashlib.py that 
can handle missing algorithm implementations makes a lot of sense (.NET 
for example has no implementation of some of the more esoteric SHA 
hashes.) I remember that in the early days of IronPython, hashlib.py had 
to be completely replaced to be used.
History
Date User Action Args
2009-09-14 18:25:54Eloffsetmessages: + msg92626
2009-08-16 22:01:02gregory.p.smithsetstatus: open -> closed
priority: normal
resolution: fixed
messages: + msg91646
2009-06-14 23:00:00Eloffsetmessages: + msg89375
2009-06-14 22:38:11amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg89374
2009-06-14 17:37:47pitrousetstage: patch review
versions: + Python 3.1, Python 2.7, Python 3.2
2009-06-14 17:05:05benjamin.petersonsetassignee: gregory.p.smith

nosy: + gregory.p.smith
2009-06-14 16:51:44Eloffsetfiles: + hashlib.py

messages: + msg89355
2009-06-14 09:27:12pitrousetnosy: + pitrou
messages: + msg89344
2009-06-13 22:45:29Eloffcreate