Message 217095 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	Sworddragon, ezio.melotti, lemburg, ncoghlan, r.david.murray, vstinner
Date	2014-04-23.22:17:11
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1398291431.63.0.582489831312.issue21331@psf.upfronthosting.co.za>
In-reply-to

Content
To understand why, understand that a byte string has no encoding inherent. So when you call b'utf8string'.decode('unicode_escape'), python has no way to know how to interpret the non-ascii characters in that bytestring. If you want the unicode_escape representation of something, you want to do 'string'.encode('unicode_escape'). If you then want that as a python string, you can do: 'mystring'.encode('unicode_escape').decode('ascii') In theory there ought to be a way to use the codecs module to go directly from unicode string to unicode-escaped string, but I don't know how to do it, since the proposal for the 'transform' method was rejected :) Just to bend your brain a bit further, note that this does work: >>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 'unicode-escape') 'ä'

To understand why, understand that a byte string has no encoding inherent.  So when you call b'utf8string'.decode('unicode_escape'), python has no way to know how to interpret the non-ascii characters in that bytestring.  If you want the unicode_escape representation of something, you want to do 'string'.encode('unicode_escape').  If you then want that as a python string, you can do:

    'mystring'.encode('unicode_escape').decode('ascii')

In theory there ought to be a way to use the codecs module to go directly from unicode string to unicode-escaped string, but I don't know how to do it, since the proposal for the 'transform' method was rejected :)

Just to bend your brain a bit further, note that this does work:

>>> codecs.decode(codecs.encode('ä', 'unicode-escape').decode('ascii'), 'unicode-escape')
'ä'

History
Date	User	Action	Args
2014-04-23 22:17:11	r.david.murray	set	recipients: + r.david.murray, lemburg, ncoghlan, vstinner, ezio.melotti, Sworddragon
2014-04-23 22:17:11	r.david.murray	set	messageid: <1398291431.63.0.582489831312.issue21331@psf.upfronthosting.co.za>
2014-04-23 22:17:11	r.david.murray	link	issue21331 messages
2014-04-23 22:17:11	r.david.murray	create