This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author zuo
Recipients ezio.melotti, vstinner, zuo
Date 2013-11-10.02:51:42
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
It seems that the 'raw_unicode_escape' codec:

1) produces data that could be suitable for Python 2.x raw unicode string literals and not for Python 3.x raw unicode string literals (in Python 3.x \u... escapes are also treated literally);

2) seems to be buggy anyway: bytes in range 128-255 are encoded with the 'latin-1' encoding (in Python 3.x it is definitely a bug; and even in Python 2.x the feature is dubious, although at least the Py2's eval() and compile() functions officially accept 'latin-1'-encoded byte strings...).

Python 3.3:

>>> b = "zażółć".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
>>> eval(literal)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xf3 in position 8: invalid continuation byte
>>> b'\xf3'.decode('latin-1')
>>> b = "zaż".encode('raw_unicode_escape')
>>> literal = b'r"' + b + b'"'
>>> literal
>>> eval(literal)
>>> print(eval(literal))

It believe that the 'raw_unicode_escape' codes should either be deprecated and later removed or be modified to accept only printable ascii characters.

PS. Also, as a side note: neither 'raw_unicode_escape' nor 'unicode_escape' does escape quotes (see issue #7615) -- shouldn't it be at least documented explicitly?
Date User Action Args
2013-11-10 02:51:46zuosetrecipients: + zuo, vstinner, ezio.melotti
2013-11-10 02:51:45zuosetmessageid: <>
2013-11-10 02:51:45zuolinkissue19539 messages
2013-11-10 02:51:42zuocreate