I think following patch doesn't introduce undefined behavior. With this patch GCC on 32-bit i386 Linux produces the same code as for simple unsigned short read.

I don't know wherever the benefit worth such complication. I don't know  wherever the patch can cause performance regression on other platforms or compilers.
