Author vstinner
Recipients Arfrever, a.badger, abadger1999, benjamin.peterson, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2013-08-22.13:18:23
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za>
In-reply-to
Content
> The surrogateescape error handler is dangerous with utf-16/32. It can produce globally invalid output.

I don't understand, can you give an example? surrogateescape generate invalid encoded string with any encoding. Example with UTF-8:

>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'

>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'

>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte

So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence.
History
Date User Action Args
2013-08-22 13:18:23vstinnersetrecipients: + vstinner, lemburg, ncoghlan, pitrou, abadger1999, benjamin.peterson, ezio.melotti, a.badger, Arfrever, r.david.murray, serhiy.storchaka
2013-08-22 13:18:23vstinnersetmessageid: <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za>
2013-08-22 13:18:23vstinnerlinkissue18713 messages
2013-08-22 13:18:23vstinnercreate