Title: Warn about octal escapes > 0o377 in re
Messages (11)
msg226570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-08 11:07
Currently the re module accepts octal escapes from \400 to \777, but ignore highest bit.

>>>'\542', 'abc')
<_sre.SRE_Match object; span=(1, 2), match='b'>

This behavior looks surprising and is inconsistent with the regex module which preserve highest bit. Such escaping is not portable across different regular exception engines.

I propose to add a warning when octal escape value is larger than 0o377. Here is preliminary patch which adds UserWarning. Or may be better to emit DeprecationWarning and then replace it by ValueError in future releases?
msg226798 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-11 19:20
I think we should simply raise ValueError in 3.5. There's no reason to accept such invalid escapes.
msg226801 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-11 20:34
Well, here is a patch which makes re raise an exception on suspicious octals.
msg226809 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-12 07:36
re_octal_escape_overflow_raise.patch: you should write a subfunction to not repeat the error message 3 times.

+            if c > 0o377:

Hum, I never use octal. 255 instead of 0o377 would be less surprising :-p By the way, you should also check for negative numbers.

>>> -3 & 0xff

Before, "& 0xff" also converted negative numbers to positive in range 0..255.
msg226826 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-12 16:29
> By the way, you should also check for negative numbers.

Not in this case. You can't construct negative number from three octal digits.
msg227036 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-18 10:03
Warning or exception? This is a question.
msg227039 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-09-18 12:44
> Warning or exception? This is a question.

Using -Werror, warnings raise exceptions :-)
msg227040 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-09-18 13:17
This is an error, so it should really be an exception. There's no use case for being lenient, IMO.
msg227238 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-21 20:50
If this is error, should the patch be applied to maintained releases?
msg227386 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-23 20:26
New changeset 3b32f495fb38 by Serhiy Storchaka in branch 'default':
Issue #22362: Forbidden ambiguous octal escapes out of range 0-0o377 in
msg227387 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-09-23 20:28
Thanks Antoine and Victor for the review.
