classification
Title: _sha.sha().digest() method is endian-sensitive. and hexdigest()
Type: behavior Stage:
Components: Extension Modules Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: gregory.p.smith, jcea, kristjan.jonsson, ncoghlan, scott.dial
Priority: normal Keywords:

Created on 2010-11-16 07:36 by kristjan.jonsson, last changed 2010-11-24 10:02 by kristjan.jonsson. This issue is now closed.

Messages (6)
msg121268 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-16 07:36
in shamodule.c, the digest() method just creates a simple bytes string of the digest.  The digest is stored as an array of 32 bit integers in the native representation.  Therefore, the digest will be different on big-  and little-endian machines.

The specification (http://en.wikipedia.org/wiki/SHA-1) suggest that the digest should actually be big endian, so the standard implementation on most home machines is actually wrong

Actually, looking at the code, hexdigest() has the same problem!
msg121506 - (view) Author: Scott Dial (scott.dial) Date: 2010-11-19 10:07
Got a test case that demonstrates a failure? Looks like it works to me...

$ uname -ip
sparc SUNW,Sun-Fire-280R
$ python -c 'import sys; print sys.byteorder'
big
$ python -c 'import sha; print sha.new(open("test", "rb").read()).hexdigest()'
851faf3199d27200abf2750c14ae6451696216a9
$ sha1sum -b test
851faf3199d27200abf2750c14ae6451696216a9 *test

# uname -ip
AMD Sempron(tm) Processor 2800+ AuthenticAMD
# python -c 'import sys; print sys.byteorder'
little
# python -c 'import sha; print sha.new(open("test", "rb").read()).hexdigest()'
851faf3199d27200abf2750c14ae6451696216a9
# sha1sum -b /tmp/test
851faf3199d27200abf2750c14ae6451696216a9 *test

I think your code analysis is wrong. Perhaps you missed the call to longReverse(), which does endianness byte-swapping, at the beginning of the sha_transform() that specifically is commented: "When run on a little-endian CPU we need to perform byte reversal on an array of longwords."
msg121508 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-19 10:26
Something is definietly weird on the PS3.  I´ll give more concrete data soon.  (and yes, I may have misread the code)
msg121515 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-11-19 14:14
If I was looking for opportunities for a compiler to do something weird, I'd start with the TestEndianness macro (i.e. maybe it is incorrectly flagging the Cell as little endian when it is actually big endian)

The endianness handling itself looks fine to me, though.
msg121524 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-19 14:42
Yes, in my original myopic observation I was mistaken in thinking that we were reading the digest out of the 5 entry int32 "digest" field in the SHAobject.
I´ve already verified that the "Endianness" field is correctly set.  What I thought was an obvious error due to people not using big-endian much, is probably much more subtle.
msg122268 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2010-11-24 10:02
Found the issue and it wasn't with sha1.
Turned out that the code was doing somethign like
sha1(buffer(unicode('str'))) which exposed the endianness of the unicode representation.
Sorry for wasting your time.
History
Date User Action Args
2010-11-24 10:02:47kristjan.jonssonsetstatus: open -> closed
resolution: not a bug
messages: + msg122268
2010-11-19 14:42:43kristjan.jonssonsetmessages: + msg121524
2010-11-19 14:14:17ncoghlansetnosy: + ncoghlan
messages: + msg121515
2010-11-19 13:32:26jceasetnosy: + jcea
2010-11-19 10:26:37kristjan.jonssonsetmessages: + msg121508
2010-11-19 10:07:38scott.dialsetnosy: + scott.dial
messages: + msg121506
2010-11-16 22:32:50pitrousetnosy: + gregory.p.smith
2010-11-16 07:36:52kristjan.jonssoncreate