Title: \u and \U in raw strings have regressed in 3.0a4
Type: behavior Stage:
Components: Interpreter Core Versions: Python 3.0
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, gvanrossum, lemburg
Priority: release blocker Keywords:

Created on 2008-04-05 14:45 by gvanrossum, last changed 2008-04-07 18:07 by gvanrossum. This issue is now closed.

Messages (4)
msg64977 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-05 14:45
In 2.x, \uDDDD and \UDDDDDDDD are interpreted as Unicode escapes in
raw Unicode strings. That was a mistake, but we can't fix it (except
when using "from __future__ import unicode_literals"). In 3.0, \u or
\U in a raw string should have no special meaning -- it's just a
backslash followed by 'u' or 'U'.

This was fixed in 3.0a3. It seems to have reverted to the old (2.x)
behavior in 3.0a4.

msg64979 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-05 14:55
fixed in r62165.
msg64987 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2008-04-05 16:51
The was not a mistake, it was done on purpose since there would
otherwise have been no way to add non-ASCII Unicode code points to a raw
Unicode literal, rendering raw Unicode literals pretty useless.

Even if you use UTF-8 as source code encoding, there's no way to add
half a surrogate to a raw Unicode literal without the Unicode literals.

If you need to write a Unicode literal escape using the raw Unicode
escape encoding, you can use '\x1234'.
msg65087 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2008-04-07 18:07
We went over this before.  *I* consider the 2.x behavior a mistake, and
a decision was made to change in 3.0.  It got much worse in 3.0 because
all literals are Unicode (except byte literals).

To add a unicode value to a raw string, just concatenate a raw string
and a non-raw string, e.g.

r'whatever' '\u1234' r'whatever'

I don't understand what you meant by '\x1234' -- the \x escape only
accepts 2 hex digits.
Date User Action Args
2008-04-07 18:07:50gvanrossumsetmessages: + msg65087
2008-04-05 16:51:57lemburgsetnosy: + lemburg
messages: + msg64987
2008-04-05 14:55:15benjamin.petersonsetstatus: open -> closed
resolution: fixed
messages: + msg64979
nosy: + benjamin.peterson
2008-04-05 14:53:12gvanrossumsetcomponents: + Interpreter Core
2008-04-05 14:53:00gvanrossumsettitle: \u and \U in raw strings have reverted -> \u and \U in raw strings have regressed in 3.0a4
2008-04-05 14:45:35gvanrossumcreate