This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author serhiy.storchaka
Recipients Arfrever, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2015-03-16.07:12:51
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1426489972.28.0.338956167091.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
Proposed preliminary patch adds three functions in the codecs module:

convert_surrogates(data, errors) -- handle lone surrogates with specified error handler.

>>> codecs.convert_surrogates('a\u20ac\udca4', 'backslashreplace')
'a€\\udca4'

convert_surrogateescape(data, errors) -- handle surrogateescaped bytes with specified error handler

>>> codecs.convert_surrogateescape('a\u20ac\udca4', 'backslashreplace')
'a€\\xa4'

convert_astrals(data, errors) -- handle astral (non-BMP) characters with specified error handler.

>>> codecs.convert_astral('a\u20ac\U000e007f', 'backslashreplace')
'a€\\U000e007f'

Names are discussable.

I think also about adding two functions or error handlers (that can used with convert_surrogates and convert_astrals) for composing astral characters from surrogate pairs and vice versa.
History
Date User Action Args
2015-03-16 07:12:52serhiy.storchakasetrecipients: + serhiy.storchaka, lemburg, ncoghlan, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray
2015-03-16 07:12:52serhiy.storchakasetmessageid: <1426489972.28.0.338956167091.issue18814@psf.upfronthosting.co.za>
2015-03-16 07:12:52serhiy.storchakalinkissue18814 messages
2015-03-16 07:12:51serhiy.storchakacreate