Message238182
Proposed preliminary patch adds three functions in the codecs module:
convert_surrogates(data, errors) -- handle lone surrogates with specified error handler.
>>> codecs.convert_surrogates('a\u20ac\udca4', 'backslashreplace')
'a€\\udca4'
convert_surrogateescape(data, errors) -- handle surrogateescaped bytes with specified error handler
>>> codecs.convert_surrogateescape('a\u20ac\udca4', 'backslashreplace')
'a€\\xa4'
convert_astrals(data, errors) -- handle astral (non-BMP) characters with specified error handler.
>>> codecs.convert_astral('a\u20ac\U000e007f', 'backslashreplace')
'a€\\U000e007f'
Names are discussable.
I think also about adding two functions or error handlers (that can used with convert_surrogates and convert_astrals) for composing astral characters from surrogate pairs and vice versa. |
|
Date |
User |
Action |
Args |
2015-03-16 07:12:52 | serhiy.storchaka | set | recipients:
+ serhiy.storchaka, lemburg, ncoghlan, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray |
2015-03-16 07:12:52 | serhiy.storchaka | set | messageid: <1426489972.28.0.338956167091.issue18814@psf.upfronthosting.co.za> |
2015-03-16 07:12:52 | serhiy.storchaka | link | issue18814 messages |
2015-03-16 07:12:51 | serhiy.storchaka | create | |
|