This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: raw unicode strings interpret \u and \U (but not \n, \xHH, ...)
Type: Stage:
Components: Versions: Python 2.7, Python 2.6, Python 2.5
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, vstinner
Priority: normal Keywords:

Created on 2011-02-16 23:29 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (3)
msg128701 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-02-16 23:29
len(ur'\u0000') == len(u'\u0000') == 1
len(ur'\U0010FFFF') == len(u'\U0010FFFF') == 1

but

>>> len(ur'\n'), len(u'\n')
(2, 1)
>>> len(ur'\x00'), len(u'\x00')
(4, 1)

\u and \U should not be interpreted in raw Unicode strings.
msg128703 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-02-16 23:55
This has changed in python 3, and is even documented: http://docs.python.org/dev/py3k/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit (6th bullet)

Python 2.x could not be changed, for compatibility reasons.
msg128704 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-02-16 23:57
> Python 2.x could not be changed, for compatibility reasons.

Well, it is not a bug because it is documented!

<< When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed while all other backslashes are left in the string. >>

I agree that Python2 cannot be changed, but this behaviour is a little bit surprising :-) Let's move to Python3!
History
Date User Action Args
2022-04-11 14:57:13adminsetgithub: 55437
2011-02-16 23:57:01vstinnersetstatus: open -> closed

messages: + msg128704
resolution: not a bug
2011-02-16 23:55:18amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg128703
2011-02-16 23:29:25vstinnercreate