Author mark.dickinson
Date 2012-09-20.20:16:58
[Broken out of the discussion in issue 15144]

Some of the newly-optimized code in Objects/unicodeobject.c contains strict aliasing violations;  under the C standards, this is undefined behaviour (C99 6.5p7).

An example occurs in ascii_decode:

    unsigned long value = *(const unsigned long *) _p;

Here the pointer dereference violates the strict aliasing rule.

I think these portions of Objects/unicodeobject.c should be rewritten to avoid the undefined behaviour.

This is not a purely theoretical problem: compilers are known to make optimizations based on the assumption that strict aliasing is not violated.  Early versions of David Gay's dtoa.c gave incorrect results as a result of strict aliasing violations, for example; see [1].

[2] gives a stackoverflow reference explaining strict aliasing.

