classification
Title: Deprecate invalid escape sequences in str/bytes
Type: behavior Stage: resolved
Components: Interpreter Core, Library (Lib), Unicode Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Chi Hsuan Yen, ebarry, ezio.melotti, gvanrossum, jason.coombs, jayvdb, martin.panter, python-dev, r.david.murray, serhiy.storchaka, terry.reedy, vstinner, ztane
Priority: normal Keywords: patch

Created on 2016-06-21 20:34 by ebarry, last changed 2017-07-11 17:53 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
deprecate_invalid_unicode_escapes.patch ebarry, 2016-06-21 20:34 review
deprecate_invalid_unicode_escapes_2.patch ebarry, 2016-06-24 05:46 review
deprecate_invalid_escapes_only_1.patch ebarry, 2016-06-26 23:04 review
invalid_stdlib_escapes_1.patch ebarry, 2016-06-26 23:04 review
deprecate_invalid_escapes_only_2.patch ebarry, 2016-06-27 01:40 review
deprecate_invalid_escapes_only_2.patch martin.panter, 2016-06-27 01:42 Error handling review
deprecate_invalid_escapes_only_3.patch ebarry, 2016-06-28 01:15 review
deprecate_invalid_escapes_both_1.patch ebarry, 2016-07-18 15:50 review
invalid_stdlib_escapes_2.patch ebarry, 2016-08-14 21:16 review
deprecate_invalid_escapes_both_2.patch ebarry, 2016-08-14 21:17 review
deprecate_invalid_escapes_both_3.patch ebarry, 2016-09-01 13:19 review
invalid_stdlib_escapes_3.patch ebarry, 2016-09-06 00:13
deprecate_invalid_escapes_both_4.patch ebarry, 2016-09-07 12:45 review
invalid_stdlib_escapes_3_regenerated.patch ebarry, 2016-09-07 12:49 review
invalid_stdlib_escapes_3_rebased_2.patch ebarry, 2016-09-08 01:58 review
invalid_stdlib_escapes_4.patch ebarry, 2016-09-08 11:26 review
invalid_stdlib_escapes_5.patch ebarry, 2016-09-08 13:00 review
deprecate_invalid_escapes_both_5.patch r.david.murray, 2016-09-08 18:46 review
verbose-deprecation.diff Chi Hsuan Yen, 2016-09-11 09:35 review
Messages (55)
msg269022 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-21 20:34
Attached patch deprecates invalid escape sequences in unicode strings. The point of this is to prevent issues such as #27356 (and possibly other similar ones) in the future.

Without the patch:

>>> "hello \world"
'hello \\world'

With the patch:

>>> "hello \world"
DeprecationWarning: invalid escape sequence 'w'

I'll need some help (patch isn't mergeable yet):

test_doctest fails on my machine with the patch (and -W), and I don't know how to fix it. test_ast fails an assertion (!PyErr_Occurred() in PyObject_Call in abstract.c) when -W is on, and I also don't know how to fix it (I don't even know what causes it).

Of course, I went ahead and fixed all instances of invalid escape sequences in the stdlib (that I could find) so that no DeprecationWarning is encountered.

Lastly, I thought about also doing this to bytes, but I ran into some issues with some invalid escapes such as \u, and _codecs.escape_decode would trigger the warning when passed br"\8" (for example). Ultimately, I decided to leave bytes alone for now, since it's mostly on the lower-level side of things. If there's interest I can add it back.
msg269114 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-23 14:41
Have you searched the python-dev and python-ideas archives for the previous discussions of this issue?  I don't remember for sure, but I think Guido might have made a ruling (not that the discussion couldn't be reopened if he has, but, well...)
msg269119 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-23 15:26
Now I have! I found nothing on Python-Dev, but apparently it's been discussed on Python-ideas before: https://mail.python.org/pipermail/python-ideas/2015-August/035031.html Guido hasn't participated in that discussion, and most of it was "This will break people's code", with people both for and against the idea, without an apparent consensus.

Should I try a second round on Python-ideas, to try and get a consensus (or a BDFL ruling)?
msg269122 - (view) Author: Antti Haapala (ztane) * Date: 2016-06-23 15:59
it is handy to be able to use `\w` and `\d` in non-raw-string *regular expressions*, without too much backslashitis. Seems to be in use in Python standard library as well, for example in csv.py
msg269152 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-24 02:59
Yes, it's in use in an awful lot of places (see my patch). The proper fix is to use raw strings, or, if you need actual escapes in the same string, manually escape them. However, as you'll see by looking at the patch, the vast majority of cases are fixed by prepending a single 'r' to the front of the string. In fact, only csv.py and html/parser.py needed more finer-grained escaping.

I think that the argument "It works in non-raw strings" is weak. I've always used raw strings for regular expressions, and this patch would simply move this from being a style issue to being a syntax one (and I think it's fine :).
msg269155 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-24 04:22
There was a long discussion on Python-Dev. [1]  Guido taken part in it.

[1] http://comments.gmane.org/gmane.comp.python.devel/151612
msg269156 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-24 04:43
Thanks, didn't find that one. Apparently Guido's stance is "Make this a silent warning, then we can discuss about preventing it later", which happens to be what I'm doing here.
msg269158 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-24 05:46
I found the cause of the failed assertion, an invalid escape sequence slipped through in a file. Patch attached (also with Serhiy's comments).

It worries me a little though that pure Python code can cause a hard crash. Ok, it worries me a lot. Please don't merge this until it's fixed. I'm guessing this is a combination of unittest catching warnings and compiling the faulty source file. As to why a malformed node (i.e. one that raised a DeprecationWarning) managed to pass through unharmed is beyond me.
msg269322 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-26 22:19
I am okay with making it a silent warning.

Can we do it in two stages though? It doesn't have to be two releases, I just mean two separate commits: (1) fix all places in the stdlib that violate this principle; (2) separately commit the code that causes the silent deprecation (and tests for it).

What exactly was the hard crash you got? Do you think it was a bug in your own C code or in existing C code?
msg269323 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-26 23:04
I originally considered making two different patches, so there you go. deprecate_invalid_escapes_only_1.patch has the deprecation plus a test, and invalid_stdlib_escapes_1.patch fixes all invalid escapes in the stdlib.

My code was the cause, although no directly; it was 'assert(!PyErr_Occurred())' at the beginning of PyObject_Call in Objects/abstract.c which failed.

This happened when I ran the whole test suite (although just running test_ast was fine to reproduce it) with the '-W error' command line switch. One stdlib module (I don't remember which one) had one single invalid escape sequence in it, and then test_ast.ASTValidatorTests.test_stdlib_validates triggered the failed assertion. Fixing the invalid escape removes the failure and all tests pass.

One can reliably reproduce the crash with the patch by adding a string with an invalid escape in any of the stdlib files (and running with '-W error'):

No invalid sequence:

>>> import unittest, test.test_ast
>>> unittest.main(test.test_ast)
..............................................................................
----------------------------------------------------------------------
Ran 78 tests in 5.538s

OK

With an invalid sequence in a file:

>>> import unittest, test.test_ast
>>> unittest.main(test.test_ast)
............................................Fatal Python error: a function returned a result with an error set
DeprecationWarning: invalid escape sequence 'w'

During handling of the above exception, another exception occurred:

SystemError: <built-in function compile> returned a result with an error set

Current thread 0x00001ba0 (most recent call first):
  File "E:\GitHub\cpython\lib\ast.py", line 35 in parse
  File "E:\GitHub\cpython\lib\test\test_ast.py", line 944 in test_stdlib_validates
  File "E:\GitHub\cpython\lib\unittest\case.py", line 600 in run
  File "E:\GitHub\cpython\lib\unittest\case.py", line 648 in __call__
  File "E:\GitHub\cpython\lib\unittest\suite.py", line 122 in run
  File "E:\GitHub\cpython\lib\unittest\suite.py", line 84 in __call__
  File "E:\GitHub\cpython\lib\unittest\suite.py", line 122 in run
  File "E:\GitHub\cpython\lib\unittest\suite.py", line 84 in __call__
  File "E:\GitHub\cpython\lib\unittest\runner.py", line 176 in run
  File "E:\GitHub\cpython\lib\unittest\main.py", line 255 in runTests
  File "E:\GitHub\cpython\lib\unittest\main.py", line 94 in __init__
  File "<stdin>", line 1 in <module>

Then I get the usual "Python has stopped working" Windows prompt (strangely enough, before I'd get a prompt saying "Assertion failed" with the line, but not this time).

I'm not sure where the error lies exactly. Should I open another issue for that?
msg269326 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-27 00:05
Hm, if you manage to trigger an assert() in the C code by writing some evil
Python code, the C code is considered broken (unless it was using ctypes or
one or two other explicit "void-the-warranty" exceptions).

Maybe someone who has worked more with the C code recently could help you
dig into this more; my memory is unreliable when it comes to these details.
Maybe assert() calls are disabled by default? In general the error "...
returned a result with an error set" means there's a problem at the C level
where a function should have either returned an object or returned NULL
with the per-thread exception state set, but it was found to return an
object *and* set the exception state. IIRC only debug mode checks for that,
so such a bug occasionally creeps into the code. But you shouldn't assume
everything is fine until you've tracked down the cause.
msg269329 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-27 00:26
Ah right, assert() is only enabled in debug mode, I forgot that. My (very uneducated) guess is that compile() got the error (which was a warning) but then decided to return a value anyway, and the next thing that tries to call anything crashes Python. I opened #27394 to get some experts' advice.
msg269332 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-27 01:40
Aaand I feel pretty stupid; I didn't check the return value of PyErr_WarnFormat, so it was my mistake. Attached new patch, actually done right this time.
msg269333 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-27 01:42
Hello Emanual, I think I have fixed your problem with -Werror, by handling the exception returned by PyErr_WarnFormat() (see my patch). Thanks for separating the actual change from the escape violation fixes; it made it easier to spot the real problem :)

Also, I like the general idea of the change. It would be good to update the documentation as well (e.g. What’s New, and <https://docs.python.org/3.6/reference/lexical_analysis.html#string-and-bytes-literals>).

It would be good to do the same for byte string literals, at least to keep things consistent. What did you try so far? Do you have a partial patch for it?
msg269334 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-27 01:43
Hah, we posted the same fix almost at the same time :)
msg269335 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-27 01:53
Indeed, we did, thanks for letting me know my mistake :) I didn't get very far into making bytes literal disallow invalid sequences, as I ran into issues with _codecs.escape_decode throwing the warning even when the literal was fine, and I think I stopped there and figured I'd at least post that patch and see if people are interested in extending that modification to bytes (turns out so).

I forgot about docs, will do so soon, but I'll try to extend the patch for bytes first. I'll see if I can make literals warn but not e.g. _codecs.escape_decode (or anything else, really).

Thanks!
msg269340 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-27 02:59
Code samples in the documentation should also be fixed, like at <https://docs.python.org/3.6/library/re.html#re.split>. I think you can run “make -C Doc doctest” or something similar, which may help find some of these.

Also, playing with your current patch, it seems to affect the “unicode-escape” codec. Not sure if that is a problem, but it probably deserves also documenting the change.
msg269358 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-27 08:20
Guido: "I am okay with making it a silent warning."

The current patch raises a DeprecationWarning which is silent by default, but seen using python3 -Wd. What is the "long term" plan: always raise an *exception* in Python 3.7? Which exception?

Another option is to always emit a SyntaxWarning, but don't raise an exception in long term. It is possible to get an exception using python3 -Werror.

There is also FutureWarning: "Base class for warnings about constructs that will change semantically in the future" or RuntimeWarning "Base class for warnings about dubious runtime behavior".
msg269368 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-06-27 11:00
DeprecationWarning is used when we want to remove a feature. It becomes an error in the future. FutureWarning is used when we want change the meaning of a feature instead of removing it. For example re.split(':*', 'a:bc') emits a FutureWarning and returns ['a', 'bc'] because there is a plan to make it returning ['', 'a', 'b', 'c', ''].

I think "a silent warning" means that it should emit a DeprecationWarning or a PendingDeprecationWarning. Since there is no haste, we should use 2-releases deprecation period. After this a deprecation can be changed to a SynataxWarning in 3.8 and to a UnicodeDecodeError (for strings) and a ValueError (for bytes) in 4.0. The latter are converted to SyntaxError by parser. At the end we should get the same behavior as for truncated \x and \u escapes.

>>> '\u'
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
>>> b'\x'
  File "<stdin>", line 1
SyntaxError: (value error) invalid \x escape at position 0

Maybe change a parser to convert warnings to a SyntaxWarning?
msg269372 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-27 11:30
I think ultimately a SyntaxError should be fine. I don't know *when* it becomes appropriate to change a warning into an error; I was thinking 3.7 but, as Serhiy said, there's no rush. I think waiting five release cycles is overkill though, that means the error won't be until 8 years from now (assuming release cycle periods don't change)! I think at most 3.8 should be fine for making this a full-on syntax error.
msg269373 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2016-06-27 12:45
@ebarry: To move faster, you should also worker with linters (pylint, pychecker, pyflakes, pycodestyle, flake8, ...) to log a warning to help projects to be prepared this change. linters are used on Python 2-only projects, so it will help them to be prepared to the final Python 3.<n> which will raise an exception.
msg269376 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-06-27 13:28
Yes, this change is likely to break a lot of code, so an extended deprecation period (certainly longer than 3.7, which Guido has already mandated) is the minimum).  Guido hasn't agreed to making it an error yet, as far as I can see ;)
msg269382 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-06-27 15:00
I think ultimately it has to become an error (otherwise I wouldn't
have agreed to the warning, silent or not). But because there's so
much 3rd party code that depends on it we indeed need to take
"several" releases before we go there.

Contacting the PyCQA folks would also be a great idea -- can anyone
volunteer to do so?
msg269388 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-27 17:02
Easing transition is always a good idea. I'll contact the PyCQA people later today when I'm back home.

On afterthought, it makes sense to wait more than two release cycles before making this an error. I don't really have a strong opinion when exactly that should happen.
msg269413 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-06-28 01:15
Just brought this to the attention of the code-quality mailing list, so linter maintainers should (hopefully!) catch up soon.

Also new patch, I forgot to add '\c' in the tests.
msg269416 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-06-28 02:39
Forgot to say I reviewed invalid_stdlib_escapes_1.patch the other day and can’t see any problems.
msg270765 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-07-18 15:50
Here's a new patch which also deprecates invalid escape sequences in bytes. Tests included with test_codecs.

Patch includes and supersedes deprecate_invalid_escapes_only_3.patch, and I have not found a single instance of an invalid escape sequence other than in test_codecs, so this should be fine now.
msg272439 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-08-11 11:56
I am trying out your patch at the moment. There are plenty of test suite failures; I ran the test suite with approximately the following:

./python -bWerror -m test -Wr -j0 -u network -x test_{mailbox,shelve,faulthandler,multiprocessing_main_handling,venv,warnings}

Importing modules sometimes fails or generates the warning, but this goes away if the file is not out of date. E.g. run “touch Lib/test/test_codecs.py”, and then make sure you next import that module with -Wall or -Werror enabled.

374 tests OK.
10 tests failed:
    test___all__ test_ast test_codecs test_doctest test_fstring
    test_idle test_strlit test_trace test_unicode
    test_zipimport_support

I started pasting some of the failures here, but gave up as more and more failed. Let me know if you want the full details.

======================================================================
ERROR: test_coverage (test.test_trace.TestCoverage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_trace.py", line 312, in test_coverage
    self._coverage(tracer)
  File "/media/disk/home/proj/python/cpython/Lib/test/test_trace.py", line 307, in _coverage
    r.write_results(show_missing=True, summary=True, coverdir=TESTFN)
  File "/media/disk/home/proj/python/cpython/Lib/trace.py", line 284, in write_results
    lnotab = _find_executable_linenos(filename)
  File "/media/disk/home/proj/python/cpython/Lib/trace.py", line 403, in _find_executable_linenos
    code = compile(prog, filename, "exec")
DeprecationWarning: invalid escape sequence 'w'
**********************************************************************
File "/media/disk/home/proj/python/cpython/Lib/test/test_doctest.py", line 288, in test.test_doctest.test_DocTest
Failed example:
    docstring = '''
        >>> print(12)
        12

    Non-example text.

        >>> print('another\example')
        another
        example
    '''
Exception raised:
    Traceback (most recent call last):
      File "/media/disk/home/proj/python/cpython/Lib/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
    DeprecationWarning: invalid escape sequence 'e'
**********************************************************************
[Many subsequent NameError exceptions from test_doctest]
**********************************************************************
File "/tmp/tmphzbypj98/test_zip.zip/test_zipped_doctest.py", line 288, in test_zipped_doctest.test_DocTest
Failed example:
    docstring = '''
        >>> print(12)
        12

    Non-example text.

        >>> print('another\example')
        another
        example
    '''
Exception raised:
    Traceback (most recent call last):
      File "/media/disk/home/proj/python/cpython/Lib/doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
    DeprecationWarning: invalid escape sequence 'e'
**********************************************************************
[More failures]

======================================================================
FAIL: test_all (test.test___all__.AllTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test___all__.py", line 105, in test_all
    self.check_all(modname)
  File "/media/disk/home/proj/python/cpython/Lib/test/test___all__.py", line 28, in check_all
    raise FailedImport(modname)
  File "/media/disk/home/proj/python/cpython/Lib/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/media/disk/home/proj/python/cpython/Lib/test/support/__init__.py", line 1130, in _filterwarnings
    raise AssertionError("unhandled warning %s" % reraise[0])
AssertionError: unhandled warning {message : DeprecationWarning("invalid escape sequence '('",), category : 'DeprecationWarning', filename : '/media/disk/home/proj/python/cpython/Lib/importlib/_bootstrap.py', lineno : 222, line : None}

======================================================================
ERROR: test_escape_order (test.test_fstring.TestCase) (str='f\'{"a"\\!r}\'')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_fstring.py", line 20, in assertAllRaise
    eval(str)
DeprecationWarning: invalid escape sequence '!'

======================================================================
ERROR: test_escape (test.test_codecs.EscapeDecodeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_codecs.py", line 1218, in test_escape
    decode(b"\\" + b)
OverflowError: character argument not in range(0x110000)

======================================================================
ERROR: test_escape_decode (test.test_codecs.UnicodeEscapeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/test_codecs.py", line 2467, in test_escape_decode
    check(br"[\8]", r"[\8]")
  File "/media/disk/home/proj/python/cpython/Lib/test/test_codecs.py", line 26, in check
    self.assertEqual(coder(input), (expect, len(input)))
DeprecationWarning: invalid escape sequence '8'

test test_unicode crashed -- Traceback (most recent call last):
  File "/media/disk/home/proj/python/cpython/Lib/test/libregrtest/runtest.py", line 167, in runtest_inner
    the_module = importlib.import_module(abstest)
  File "/media/disk/home/proj/python/cpython/Lib/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 996, in _gcd_import
  File "<frozen importlib._bootstrap>", line 979, in _find_and_load
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 663, in exec_module
  File "<frozen importlib._bootstrap_external>", line 770, in get_code
  File "<frozen importlib._bootstrap_external>", line 730, in source_to_code
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
DeprecationWarning: invalid escape sequence '?'
msg272441 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-08-11 12:15
Hmm, that's odd, I recall some of the failures from testing, and thought I fixed them. Some of these are brand new, though, so thanks! I'll run and fix the tests (and modules as well); should likely have a patch by the weekend :)
msg272696 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-08-14 21:16
Here's a new pair of patches for this. There are some small tweaks to the tests, and I properly fixed all instances of invalid escapes (I also made some strings into raw-strings at some places where it's not needed, solely for consistency with surrounding lines or functions). The patch that fixes the invalid escapes is four times larger than the previous one.

I would also advise to add to PEP 8 a bit recommending that strings used in regular expressions alwaus be raw-strings, even if there's no need to, as a lot (at least 70%) of the invalid escapes fixed were used in regexes.
msg274119 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-01 12:41
Ping. I'd like to get this merged in time for 3.6. Is there anything I can do to speed up the review?

Since the change itself is very straightforward, I think this would make sense to merge it now and then fix the invalid escapes that are found during the beta phase.
msg274120 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-01 13:01
I think "invalid escape sequence '\?'" would look cleaner than "invalid escape sequence '?'".
msg274126 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-01 13:19
Thanks Serhiy; it does look better to me too!
msg274332 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-09-04 03:29
Left some comments for invalid_stdlib_escapes_2.patch
msg274475 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-06 00:13
Updated and rebased patch. There's a few file tweaks here and there to stay up to date, otherwise it's mostly the same.

Martin, it may look like I've ignored your comments, but I'm trying to keep the patches as simple as possible, and so I don't want to go further than to make strings into raw strings (also the alignment issue you pointed out). I'd rather have the other issues addressed in another issue, as I want to get this merged in time for the feature freeze. The other issues (some which were already present) can be taken care of during the beta phase.
msg274806 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-07 12:45
Rebased patch after Victor's commit in #16334. Also regenerated invalid_stdlib_escapes_3 in the hopes that Rietveld picks it up.
msg274837 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-09-07 17:01
+1 on getting this in. Who can help reviewing and merging before beta 1?
msg274999 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-08 11:26
Thank you R. David for the review, here's a new patch with the one change.
msg275009 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-08 12:50
I suggest to not change fixcid.py. It is not correct and there is special issue for this (issue27952).
msg275010 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-08 13:00
All right, since you'll work on it I'm leaving it out. Removed it and test_bytes (which you already fixed, thanks!) from new patch.
msg275084 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-08 18:00
New changeset b4cc62473c13 by R David Murray in branch 'default':
#27364: fix "incorrect" uses of escape character in the stdlib.
https://hg.python.org/cpython/rev/b4cc62473c13
msg275111 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-08 18:46
Here's a copy of Emanuel's deprecation patch with a versionchanged note in the lexical docs and a whatsnew entry.
msg275123 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-08 19:34
New changeset 38802c38cfe1 by R David Murray in branch 'default':
#27364: Deprecate invalid escape strings in str/byutes.
https://hg.python.org/cpython/rev/38802c38cfe1
msg275124 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-08 19:35
Thank you David for taking the time to review and commit this :)
msg275125 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2016-09-08 19:36
Thanks Emanuel.  No bets on how much hate mail we get for this :)
msg275219 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-09-08 23:51
Thank you all for persisting on this.  I have seen numerous beginners be puzzled why normal (cooked) strings using '\' for Windows paths sometimes work and sometimes 'mysteriously' do not, as in the initially referenced issue.  I also think it better to consistently use 'r' for REs with '\' intended to be passed through to re.  (And I pushed some of the IDLE code that was patched.)
msg275237 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-09 02:37
New changeset 60085c8f01fe by R David Murray in branch 'default':
#27364: Credit Emanuel Barry in NEWS item.
https://hg.python.org/cpython/rev/60085c8f01fe
msg275298 - (view) Author: Roundup Robot (python-dev) Date: 2016-09-09 09:55
New changeset 98a57845c8cc by Martin Panter in branch 'default':
Issue #27364: Raw strings to avoid deprecated escaping in com2ann.py
https://hg.python.org/cpython/rev/98a57845c8cc
msg275757 - (view) Author: Chi Hsuan Yen (Chi Hsuan Yen) * Date: 2016-09-11 09:35
Currently the deprecation message is not so useful when fixing lots of files in a large project. For example, I have two files foo.py and bar.py:

# foo.py
import bar

# bar.py
print('\d')

It gives:
$ python3.6 -W error foo.py
Traceback (most recent call last):
  File "foo.py", line 1, in <module>
    import bar
DeprecationWarning: invalid escape sequence '\d'

Things are worse when __import__, imp or importlib are involved. I have to add some codes to show which module is imported.

It would be better to have at least filenames and line numbers:
$ ./python -W error foo.py
Traceback (most recent call last):
  File "foo.py", line 1, in <module>
    import bar
  File "/home/yen/Projects/cpython/build/bar.py", line 1
    print('\d')
         ^
SyntaxError: (deprecated usage) invalid escape sequence '\d'

I have a naive try that prints more information. Raising SyntaxError may not be a good idea, anyway.
msg276016 - (view) Author: Emanuel Barry (ebarry) * Date: 2016-09-12 10:28
Fair enough, but please open a new issue for that.

@Terry - you're welcome; that's exactly the reason I pushed for it :)
msg276287 - (view) Author: Chi Hsuan Yen (Chi Hsuan Yen) * Date: 2016-09-13 15:14
Opened a new issue at Issue28128.
msg298112 - (view) Author: Jason R. Coombs (jason.coombs) * (Python committer) Date: 2017-07-11 03:39
One consequence of this change is that now any string that has a backslash needs to be escaped or raw, leading to changes like this on (https://github.com/cherrypy/cherrypy/pull/1610/commits/1d8c03ea8c5fe90f29bbea267300b97c78391c24#diff-be33a4f55d59dfc70fc6452482f3a7a4) where the diagram in the docstring is the culprit. An escaped backslash is not viable in this case, so a raw string is required.

This particular example strikes me as counter-intuitive, though maybe I just need to adjust my intuition.

Was the intention for a docstring like above to use raw strings?
msg298114 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2017-07-11 04:14
Yes.
msg298115 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-11 04:15
Yes, this was the intention. One of often errors is using "\n" in non-raw docstrings. This change doesn't prevent this error, but increases chances of catching it when there are other backslashes in the docstring.
msg298170 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-07-11 17:53
Also note that we have fixed a number of bugs in the stdlib code where a raw string was not used for a docstring when it should have been.  And when I say bugs, I mean both formatting problems in pydoc, and doctest bugs.  There may even have been a case where it produced a code bug, but I'm not sure I'm recalling that correctly :)

So yes, requiring that a docstring containing backslashes be marked as a raw string is very intentional.
History
Date User Action Args
2017-07-11 17:53:19r.david.murraysetmessages: + msg298170
2017-07-11 04:15:13serhiy.storchakasetmessages: + msg298115
2017-07-11 04:14:52gvanrossumsetmessages: + msg298114
2017-07-11 03:39:49jason.coombssetnosy: + jason.coombs
messages: + msg298112
2016-09-13 15:14:02Chi Hsuan Yensetmessages: + msg276287
2016-09-12 10:28:13ebarrysetmessages: + msg276016
2016-09-11 09:35:50Chi Hsuan Yensetfiles: + verbose-deprecation.diff
nosy: + Chi Hsuan Yen
messages: + msg275757

2016-09-09 09:55:37python-devsetmessages: + msg275298
2016-09-09 02:37:46python-devsetmessages: + msg275237
2016-09-08 23:51:40terry.reedysetnosy: + terry.reedy
messages: + msg275219
2016-09-08 19:36:22r.david.murraysetmessages: + msg275125
2016-09-08 19:35:58ebarrysetstatus: open -> closed
resolution: fixed
messages: + msg275124

stage: patch review -> resolved
2016-09-08 19:34:36python-devsetmessages: + msg275123
2016-09-08 18:46:32r.david.murraysetfiles: + deprecate_invalid_escapes_both_5.patch

messages: + msg275111
2016-09-08 18:00:28python-devsetnosy: + python-dev
messages: + msg275084
2016-09-08 13:00:52ebarrysetfiles: + invalid_stdlib_escapes_5.patch

messages: + msg275010
2016-09-08 12:50:10serhiy.storchakasetmessages: + msg275009
2016-09-08 11:26:36ebarrysetfiles: + invalid_stdlib_escapes_4.patch

messages: + msg274999
2016-09-08 01:58:16ebarrysetfiles: + invalid_stdlib_escapes_3_rebased_2.patch
2016-09-07 17:01:25gvanrossumsetmessages: + msg274837
2016-09-07 12:49:20ebarrysetfiles: + invalid_stdlib_escapes_3_regenerated.patch
2016-09-07 12:49:02ebarrysetfiles: - invalid_stdlib_escapes_3_regen.patch
2016-09-07 12:45:18ebarrysetfiles: + invalid_stdlib_escapes_3_regen.patch
2016-09-07 12:45:05ebarrysetfiles: + deprecate_invalid_escapes_both_4.patch

messages: + msg274806
2016-09-06 00:13:39ebarrysetfiles: + invalid_stdlib_escapes_3.patch

messages: + msg274475
title: Deprecate invalid unicode escape sequences -> Deprecate invalid escape sequences in str/bytes
2016-09-04 03:29:06martin.pantersetmessages: + msg274332
2016-09-01 13:19:49ebarrysetfiles: + deprecate_invalid_escapes_both_3.patch

messages: + msg274126
2016-09-01 13:01:26serhiy.storchakasetmessages: + msg274120
2016-09-01 12:41:46ebarrysetmessages: + msg274119
2016-08-23 07:20:44jayvdbsetnosy: + jayvdb
2016-08-14 21:17:11ebarrysetfiles: + deprecate_invalid_escapes_both_2.patch
2016-08-14 21:16:57ebarrysetfiles: + invalid_stdlib_escapes_2.patch

messages: + msg272696
2016-08-11 12:15:41ebarrysetmessages: + msg272441
2016-08-11 11:56:28martin.pantersetmessages: + msg272439
2016-07-18 15:50:03ebarrysetfiles: + deprecate_invalid_escapes_both_1.patch

messages: + msg270765
2016-06-28 02:39:31martin.pantersetmessages: + msg269416
2016-06-28 01:15:04ebarrysetfiles: + deprecate_invalid_escapes_only_3.patch

messages: + msg269413
2016-06-27 17:02:24ebarrysetmessages: + msg269388
2016-06-27 15:00:28gvanrossumsetmessages: + msg269382
2016-06-27 13:28:36r.david.murraysetmessages: + msg269376
2016-06-27 12:45:51vstinnersetmessages: + msg269373
2016-06-27 11:30:10ebarrysetmessages: + msg269372
2016-06-27 11:00:51serhiy.storchakasetmessages: + msg269368
2016-06-27 08:20:44vstinnersetmessages: + msg269358
2016-06-27 02:59:05martin.pantersetmessages: + msg269340
2016-06-27 01:53:54ebarrysetmessages: + msg269335
2016-06-27 01:43:55martin.pantersetmessages: + msg269334
2016-06-27 01:42:52martin.pantersetfiles: + deprecate_invalid_escapes_only_2.patch
nosy: + martin.panter
messages: + msg269333

2016-06-27 01:40:30ebarrysetfiles: + deprecate_invalid_escapes_only_2.patch

messages: + msg269332
2016-06-27 00:26:59ebarrysetmessages: + msg269329
2016-06-27 00:05:31gvanrossumsetmessages: + msg269326
2016-06-26 23:04:39ebarrysetfiles: + invalid_stdlib_escapes_1.patch
2016-06-26 23:04:24ebarrysetfiles: + deprecate_invalid_escapes_only_1.patch

messages: + msg269323
2016-06-26 22:19:53gvanrossumsetmessages: + msg269322
2016-06-25 05:34:47serhiy.storchakasetnosy: + gvanrossum
2016-06-24 05:47:03ebarrysetfiles: + deprecate_invalid_unicode_escapes_2.patch

messages: + msg269158
2016-06-24 04:43:28ebarrysetmessages: + msg269156
2016-06-24 04:22:39serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg269155
2016-06-24 02:59:45ebarrysetmessages: + msg269152
2016-06-23 15:59:29ztanesetnosy: + ztane
messages: + msg269122
2016-06-23 15:26:16ebarrysetmessages: + msg269119
2016-06-23 14:41:14r.david.murraysetnosy: + r.david.murray
messages: + msg269114
2016-06-21 20:34:19ebarrycreate