classification
Title: Unrecognized string literal escape sequences give SyntaxErrors
Type: enhancement Stage: resolved
Components: Documentation, Unicode Versions: Python 3.3, Python 3.4, Python 2.7
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: docs@python Nosy List: docs@python, ezio.melotti, markeganfuller, r.david.murray, reynir, tim.golden
Priority: normal Keywords: easy

Created on 2013-04-17 15:37 by reynir, last changed 2013-09-24 10:01 by tim.golden. This issue is now closed.

Messages (5)
msg187173 - (view) Author: Reynir Reynisson (reynir) Date: 2013-04-17 15:37
Strings like "\u" trigger a SyntaxError. According to the language reference "all unrecognized escape sequences are left in the string unchanged"[0]. The string "\u" clearly doesn't match any of the escape sequences (in particular \uxxxx).

This may be intentional, but it is not clear from the language reference that this is the case. If it is intentional it should probably be stated more explicit in the language reference.

I think this may be confusing for new users since the syntax errors may lead them to believe the interpreter will give syntax error for all unrecognized escape sequences.

[0]: http://docs.python.org/3/reference/lexical_analysis.html#literals
msg187175 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2013-04-17 15:54
It is a recognized escape sequence, but the syntax of the escape sequence is wrong, thus the syntax error.  An "escape sequence" is a backslash character followed by a letter.  Perhaps that is the bit that needs to be clarified in the docs?
msg187201 - (view) Author: Reynir Reynisson (reynir) Date: 2013-04-17 20:00
Thank you for the quick reply. Yes, something along those lines would help. Maybe adding "The escape sequence \x expects exactly two hex digits" would make it even clearer.
msg198319 - (view) Author: Mark Egan-Fuller (markeganfuller) Date: 2013-09-23 10:28
Python correctly throws a unicode error here, directing the user towards the fact that this is an issue specifically with the unicode escaping.

>>> "\u"
  File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape


The documentation also states that "Any Unicode character can be encoded this way. Exactly eight hex digits are required."[0].

Propose closing this as Won't Fix.

[0]: http://docs.python.org/3/reference/lexical_analysis.html#literals
msg198353 - (view) Author: Tim Golden (tim.golden) * (Python committer) Date: 2013-09-24 10:01
Closing as "Works for me" in the absence of any clear proposal for docs improvement.
History
Date User Action Args
2013-09-24 10:01:14tim.goldensetstatus: open -> closed
resolution: works for me
messages: + msg198353

stage: needs patch -> resolved
2013-09-23 10:28:02markeganfullersetnosy: + tim.golden, markeganfuller
messages: + msg198319
2013-04-19 02:40:54ezio.melottisetkeywords: + easy
stage: needs patch
type: behavior -> enhancement
versions: + Python 2.7, Python 3.4
2013-04-17 20:00:30reynirsetmessages: + msg187201
2013-04-17 15:54:45r.david.murraysetnosy: + r.david.murray
messages: + msg187175
2013-04-17 15:37:35reynircreate