classification
Title: raw strings cannot end with a backslash character r'\'
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: eric.smith, gregory.p.smith, gwideman, martin.panter, r.david.murray, tim.peters, vstinner
Priority: normal Keywords:

Created on 2017-08-08 00:50 by gregory.p.smith, last changed 2019-02-21 02:29 by gwideman. This issue is now closed.

Messages (9)
msg299883 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-08-08 00:50
A raw string literal cannot end in a backslash.  There is no friendly way to write a string that ends in a backslash character.

In particular I want to put the following into a Python string: \\?\

'\\\\?\\' works but is escaping hell so I wanted to suggest to the author to use r'\\?\' but that leads to:
SyntaxError: EOL while scanning string literal

Tested in a random 3.7.0a0 build as well as older 2.7 and 3.x stable versions.

r'\' is the easiest way to reproduce this.  (which could be written using the same number of bytes as '\\'... the use case above where a string containing a lot of \s that also ends in a \ is where it matters more from a code beauty point of view)

Can we update the parser to allow this?
msg299885 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-08-08 01:06
Workaround working on all Python versions, string concatenation made by the compiler: r"...." "...".

>>> print(r"a\n" "\\")
a\n\
msg299889 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2017-08-08 01:55
What would your proposal do where an embedded backslash is currently valid?

>>> print(r'Backslash apostrophe: \'.')
Backslash apostrophe: \'.
>>> r'\'  # Comment or string?'
"\\'  # Comment or string?"
msg299890 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-08-08 02:11
This may well be a "not a bug" resolution to preserve existing semantics that weren't what I expected.
msg299893 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2017-08-08 02:28
Yes, I'm closing as not-a-bug.  It's been this way (and documented) forever.  More specifically, as the docs say, a raw string can't end with an _odd_ number of backslashes:

"""
String quotes can be escaped with a backslash, but the backslash remains in the string; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, a raw string cannot end in a single backslash (since the backslash would escape the following quote character).
"""
msg299918 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-08-08 13:24
In fact, this ia a FAQ: 

https://docs.python.org/3/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash
msg299934 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2017-08-08 15:39
If i could stop thinking of r as meaning "raw" as we document it and instead "regular expression literal" I wouldn't make this mistake. :)

Thanks everyone!
msg336174 - (view) Author: Graham Wideman (gwideman) Date: 2019-02-21 02:23
Let us be clear here that this is NOT a case where the backslash escapes the subsequent quote. If it WAS such a case, then the sequence \' would leave only the quote in the output string. But it doesn't; it leaves the complete 2-character \' in the output string.
So essentially this is a case of the character sequence \' being given a special status that causes that character pair to have a special meaning in preference to the meaning of the individual characters.
So this IS a bug -- it may be "as designed", but that produces the bug in the name of this feature, "raw string", which is patently misleading and in violation of the principle of least surprise. This is a feature (as the FAQ explains) provided explicitly for developers of regular expression parsers. So at best, these r-strings should be called "regex-oriented" string literals, which can be used elsewhere, at risk of knowing this gotcha.
msg336175 - (view) Author: Graham Wideman (gwideman) Date: 2019-02-21 02:29
Demonstration:
print("x" + r' \' ' + "x")   produces
x \' x
Where is this behavior _ever_ useful? 
Or if there is some use case for this, how frequent is it compared to the frequency of users expecting either that backslash does nothing special, or that it would behave like an escape, and not appear in the output? 

I'm not here to suggest there's some easy fix for this. I just don't want this issue closing as "not a bug" and fail to register that this design is flawed.
History
Date User Action Args
2019-02-21 02:29:54gwidemansetmessages: + msg336175
2019-02-21 02:23:45gwidemansetnosy: + gwideman
messages: + msg336174
2017-08-08 15:39:57gregory.p.smithsetmessages: + msg299934
2017-08-08 13:24:31r.david.murraysetnosy: + r.david.murray
messages: + msg299918
2017-08-08 02:28:28tim.peterssetstatus: open -> closed

nosy: + tim.peters
messages: + msg299893

resolution: not a bug
stage: resolved
2017-08-08 02:11:03gregory.p.smithsetmessages: + msg299890
2017-08-08 01:55:38martin.pantersetnosy: + martin.panter
messages: + msg299889
2017-08-08 01:06:18vstinnersetnosy: + vstinner
messages: + msg299885
2017-08-08 01:02:08eric.smithsetnosy: + eric.smith
2017-08-08 00:50:49gregory.p.smithcreate