Author mdartiailh
Recipients docs@python, mdartiailh, serhiy.storchaka
Date 2017-06-07.15:36:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <361c3150-e362-a6ed-0083-e8f40fc27806@gmail.com>
In-reply-to <1496848934.86.0.0773547758796.issue30588@psf.upfronthosting.co.za>
Content
The issue is that unicode_escape will not properly handle strings mixing
unicode character and escaped character as it assumes latin-1 compatible
characters only. For example, given the literal string 'Δ\nΔ', one
cannot encode using latin-1 and encoding it using utf-8 then using
unicode _escape produces a wrong output: 'Î\x94\nÎ\x94'. However using
codecs.escape_decode(r'Δ\nΔ'.encode('utf-8'))[0].decode('utf-8') gives
the proper output. Internally the Python parser handle this case but I
was unable to find where and this is the closest solution I found. I
guess it may be possible using error handlers but it seems much more
cumbersome.

Best regards

Matthieu
History
Date User Action Args
2017-06-07 15:36:21mdartiailhsetrecipients: + mdartiailh, docs@python, serhiy.storchaka
2017-06-07 15:36:21mdartiailhlinkissue30588 messages
2017-06-07 15:36:21mdartiailhcreate