Author Jeffrey.Kintscher
Recipients Jeffrey.Kintscher, bup, ezio.melotti, mrabarnett, serhiy.storchaka, terry.reedy
Date 2019-07-06.00:53:00
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1562374380.59.0.512587842456.issue37367@roundup.psfhosted.org>
In-reply-to
Content
Here is the problematic code in _PyBytes_DecodeEscape in Objects/bytesobject.c:

            c = s[-1] - '0';
            if (s < end && '0' <= *s && *s <= '7') {
                c = (c<<3) + *s++ - '0';
                if (s < end && '0' <= *s && *s <= '7')
                    c = (c<<3) + *s++ - '0';
            }
            *p++ = c;

c is an int, and p is a char pointer to the new bytes object's string buffer.  For b'\407', c gets correctly calculated as 263 (0x107), but the upper bits are lost when it gets recast as a char and stored in the location pointed to by p.  Hence, b'\407' becomes b'\x07' when the object is created.

IMO, this should raise "ValueError: bytes must be in range(0, 256)" instead of silently throwing away the upper bits.  I will work on a PR.

I also took a look at how escaped hex values are handled by the same function.  It may seem at first glance that

>>> b'\x107'
b'\x107'

is returning the hex value 0x107, but in reality it is returning '\x10' as the first character and '7' as the second character.  While visually misleading, it is syntactically and semantically correct.
History
Date User Action Args
2019-07-06 00:53:00Jeffrey.Kintschersetrecipients: + Jeffrey.Kintscher, terry.reedy, ezio.melotti, mrabarnett, serhiy.storchaka, bup
2019-07-06 00:53:00Jeffrey.Kintschersetmessageid: <1562374380.59.0.512587842456.issue37367@roundup.psfhosted.org>
2019-07-06 00:53:00Jeffrey.Kintscherlinkissue37367 messages
2019-07-06 00:53:00Jeffrey.Kintschercreate