New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misleading/inaccurate documentation about unknown escape sequences in regular expressions #72636
Comments
Python 3.6+ is stricter about escaped sequences in string literals. The documentation need some improvement to clarify the change: for example https://docs.python.org/3.6/library/re.html#re.sub first says that “Unknown escapes such as \& are left alone” then, in the “Changed in” section below, states that “[in Py3.6] Unknown escapes consisting of '\' and an ASCII letter now are errors”. When such changes are made, usually the documentation reports the “new”/“current” behaviour, and the history section mention when and how some detail changed. See this thread for details: https://mail.python.org/pipermail/python-list/2016-October/715462.html |
Thank you for your report Lele. Agreed, the documentation looks misleading. Do you want to provide more clear wording? |
Maybe just remove the phrase "Unknown escapes such as \& are left alone"? |
I disagree that the documentation is at fault. This is known to break existing code, e.g. http://bugs.python.org/msg281496 I think it's not correct to change the documentation but leave the error-raising behavior for 3.6 because the deprecation was never documented in 3.5 so this will look like a gratuitous regression. bpo-27030 for reference. I also question whether it makes sense for such escapes to be illegal in the repl argument of re.sub(). I could understand for this limitation in the pattern argument, but that's not what's causing the error. |
The deprecation was documented in 3.5. https://docs.python.org/3.5/library/re.html#re.sub Deprecated since version 3.5, will be removed in version 3.6: Unknown escapes consist of '\' and ASCII letter now raise a deprecation warning and will be forbidden in Python 3.6. |
The reason for disallowing some undefined escapes is the same as in pattern strings: this would allow as to introduce new special escape sequences. For example:
Of course the need in new special escape sequences in template string is much less then in pattern string. |
@barry: repl already supports some escapes, e.g. \g<name> for named groups, although not \xXX et al, so deprecating unknown escapes like in the pattern makes sense to me. BTW, the regex module already supports \xXX, \N{XXX}, etc. |
On Nov 22, 2016, at 07:28 PM, Serhiy Storchaka wrote:
I'll note that technically speaking, you can still introduce new escapes for I'll also note that not *all* unknown sequences are rejected now, only On Nov 22, 2016, at 07:55 PM, R. David Murray wrote:
This is also a reasonable argument, but not one I've thought about since I'm On Nov 22, 2016, at 07:34 PM, Serhiy Storchaka wrote:
pattern is a regular expression string so it already follows the syntax as
Perhaps so, but I do think this is a tricky question from a compatibility |
Where do we stand on this issue? At the moment, 3.6.0 is on track to be released as is. |
I think we should discuss this on Python-Dev. |
Note that 1b162d6e3d01 in bpo-27030 (for 3.6.0rc1) has changed the behavior for re.sub replacement templates to produce a deprecation warning in 3.6 while still being treated as an error in 3.7. |
Barry, could you please improve the documentation about unknown escape sequences in regular expressions? My skills is not enough for this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: