Author bup
Recipients bup
Date 2019-06-21.19:46:08
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1561146368.97.0.686299796922.issue37367@roundup.psfhosted.org>
In-reply-to
Content
At present, the bytecode compiler can generate 512 different unicode characters, one for each integral from the range [0-511), 512 being the total number of syntactically valid permutations of 3 octal digits preceded by a backslash. However, this does not match the regex compiler, which raises an error regardless of the input type when it encounters an an octal escape character with a decimal value greater than 255. On the other hand... the bytes literal:

>>> b'\407'

is somehow valid, and can lead to extremely difficult bugs to track down, such as this nonsense:

>>> re.compile(b'\407').search(b'\a')
<re.Match object; span=(0, 1), match=b'\x07'>

I propose that the regex parser be augmented, enabling for unicode patterns the interpretation of three character octal escapes from the range(256, 512), while the bytecode parser be adjusted to match the behavior of the regex parser, raising an error for bytes literals > b"\400", rather than truncating the 9th bit.
History
Date User Action Args
2019-06-21 19:46:08bupsetrecipients: + bup
2019-06-21 19:46:08bupsetmessageid: <1561146368.97.0.686299796922.issue37367@roundup.psfhosted.org>
2019-06-21 19:46:08buplinkissue37367 messages
2019-06-21 19:46:08bupcreate