This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ncoghlan
Recipients Arfrever, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner
Date 2015-03-17.12:18:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1426594738.45.0.314347087639.issue18814@psf.upfronthosting.co.za>
In-reply-to
Content
I'd wondered about that with respect to rehandle_surrogatepass.

The current implementation looks like it processes *all* surrogates (even valid surrogate pairs), so "handle_surrogates" might be a suitable name.

If the intent is for it to be "handle_lone_surrogates", I'm not sure the current implementation achieves that, as a valid surrogate pair will match re.compile('[\ud800-\uefff]+').

The rest looks OK to me, including the decompose_astrals() and compose_surrogate_pairs() functions. Regardless of any practical utility, the latter two seem useful for *educational* purposes when it comes to unicode, by making it clear how to switch between the single code point and dual code point representations of the astrals.
History
Date User Action Args
2015-03-17 12:18:58ncoghlansetrecipients: + ncoghlan, lemburg, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka
2015-03-17 12:18:58ncoghlansetmessageid: <1426594738.45.0.314347087639.issue18814@psf.upfronthosting.co.za>
2015-03-17 12:18:58ncoghlanlinkissue18814 messages
2015-03-17 12:18:58ncoghlancreate