> can you please attach the generated assembly code for the siphash function with your compiler and your optimization flags (that is, the one that produces the above results)?

GCC (Ubuntu 4.4.3-4ubuntu5.1) options:

-pthread -c -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes   -I. -IInclude -I./Include    -DPy_BUILD_CORE

32-bit Linux on AMD Athlon(tm) 64 X2 Dual Core Processor 4600+.
