Message238285
I'd wondered about that with respect to rehandle_surrogatepass.
The current implementation looks like it processes *all* surrogates (even valid surrogate pairs), so "handle_surrogates" might be a suitable name.
If the intent is for it to be "handle_lone_surrogates", I'm not sure the current implementation achieves that, as a valid surrogate pair will match re.compile('[\ud800-\uefff]+').
The rest looks OK to me, including the decompose_astrals() and compose_surrogate_pairs() functions. Regardless of any practical utility, the latter two seem useful for *educational* purposes when it comes to unicode, by making it clear how to switch between the single code point and dual code point representations of the astrals. |
|
Date |
User |
Action |
Args |
2015-03-17 12:18:58 | ncoghlan | set | recipients:
+ ncoghlan, lemburg, pitrou, vstinner, ezio.melotti, Arfrever, r.david.murray, serhiy.storchaka |
2015-03-17 12:18:58 | ncoghlan | set | messageid: <1426594738.45.0.314347087639.issue18814@psf.upfronthosting.co.za> |
2015-03-17 12:18:58 | ncoghlan | link | issue18814 messages |
2015-03-17 12:18:58 | ncoghlan | create | |
|