Message195886
> The surrogateescape error handler is dangerous with utf-16/32. It can produce globally invalid output.
I don't understand, can you give an example? surrogateescape generate invalid encoded string with any encoding. Example with UTF-8:
>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'
>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'
>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence. |
|
Date |
User |
Action |
Args |
2013-08-22 13:18:23 | vstinner | set | recipients:
+ vstinner, lemburg, ncoghlan, pitrou, abadger1999, benjamin.peterson, ezio.melotti, a.badger, Arfrever, r.david.murray, serhiy.storchaka |
2013-08-22 13:18:23 | vstinner | set | messageid: <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za> |
2013-08-22 13:18:23 | vstinner | link | issue18713 messages |
2013-08-22 13:18:23 | vstinner | create | |
|