Author serhiy.storchaka
Recipients ezio.melotti, mrabarnett, pitrou, serhiy.storchaka
Date 2015-03-09.13:28:25
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1425907707.07.0.452600036872.issue23622@psf.upfronthosting.co.za>
In-reply-to
Content
Regular expressions use the backslash character for two functions:
1) to indicate special forms;
2) to allow special characters to be used without invoking their special meaning.

If backslash + character is not recognized as special form (1), it interpreted in meaning (2).

Usually new special forms have form backslash + ASCII letter, because unlike to other characters single ASCII letters do not have special meaning in any regular expression engine or programming language. This using the backslash with inner ASCII letter dangerous. Currently it means just this letter literally, but in future it can mean special form. For example \u and \U forms were added in 3.3 and this could break regular expression patters that use \u and \U before.

To avoid possible breaking it makes sense to reject unrecognized backslash + ASCII letter sequences. Proposed patch adds deprecation warnings when unknown escape of ASCII letter is used. The idea was proposed by Matthew Barnett [1].

[1] http://permalink.gmane.org/gmane.comp.python.devel/151657
History
Date User Action Args
2015-03-09 13:28:27serhiy.storchakasetrecipients: + serhiy.storchaka, pitrou, ezio.melotti, mrabarnett
2015-03-09 13:28:27serhiy.storchakasetmessageid: <1425907707.07.0.452600036872.issue23622@psf.upfronthosting.co.za>
2015-03-09 13:28:27serhiy.storchakalinkissue23622 messages
2015-03-09 13:28:26serhiy.storchakacreate