This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients Arfrever, ezio.melotti, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2014-08-24.07:58:20
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1408867101.32.0.177426890365.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
I think similar functions should be added in the unicodedata module rather than the string module or as str methods.  If I'm not mistaken this was already proposed in another issue.
In C we already added macros like IS_{HIGH|LOW|}_SURROGATE and possibly others to help dealing with surrogates but AFAIK there's no Python equivalent yet.
As for the specific constants/functions/methods you propose, IMHO the name escaped_surrogates is not too clear.  If it's a string of lone surrogates I would just call it unicodedata.surrogates (and .high_surrogates/.low_surrogates).  These can also be used to build oneliner to check if a string contains surrogates and/or to remove them.
clean has a very generic name with no hints about surrogates, and its purpose is quite specific.
I'm also not a big fan of redecode.  The equivalent calls to encode/decode are not much longer and more explicit.  Also having to redecode often indicates that there's a bug before that should be fixed instead (if possible).
History
Date User Action Args
2014-08-24 07:58:21ezio.melottisetrecipients: + ezio.melotti, ncoghlan, pitrou, vstinner, Arfrever, r.david.murray, serhiy.storchaka
2014-08-24 07:58:21ezio.melottisetmessageid: <1408867101.32.0.177426890365.issue18814@psf.upfronthosting.co.za>
2014-08-24 07:58:21ezio.melottilinkissue18814 messages
2014-08-24 07:58:20ezio.melotticreate