This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author r.david.murray
Recipients Sworddragon, ezio.melotti, lemburg, ncoghlan, r.david.murray, vstinner
Date 2014-04-23.22:17:11
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1398291431.63.0.582489831312.issue21331@psf.upfronthosting.co.za>
In-reply-to
Content
To understand why, understand that a byte string has no encoding inherent.  So when you call b'utf8string'.decode('unicode_escape'), python has no way to know how to interpret the non-ascii characters in that bytestring.  If you want the unicode_escape representation of something, you want to do 'string'.encode('unicode_escape').  If you then want that as a python string, you can do:

    'mystring'.encode('unicode_escape').decode('ascii')

In theory there ought to be a way to use the codecs module to go directly from unicode string to unicode-escaped string, but I don't know how to do it, since the proposal for the 'transform' method was rejected :)

Just to bend your brain a bit further, note that this does work:

>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 'unicode-escape')
'ä'
History
Date User Action Args
2014-04-23 22:17:11r.david.murraysetrecipients: + r.david.murray, lemburg, ncoghlan, vstinner, ezio.melotti, Sworddragon
2014-04-23 22:17:11r.david.murraysetmessageid: <1398291431.63.0.582489831312.issue21331@psf.upfronthosting.co.za>
2014-04-23 22:17:11r.david.murraylinkissue21331 messages
2014-04-23 22:17:11r.david.murraycreate