Author mrabarnett
Recipients akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2010-09-21.11:41:33
SpamBayes Score 0.000120608
Marked as misclassified No
Message-id <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
I use Python 3, where len("\U00010337") == 2 on a narrow build.

Yes, wide Unicode on a narrow build is a problem:

>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]

I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.

I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.
History
Date User Action Args
2010-09-21 11:41:36mrabarnettsetrecipients: + mrabarnett, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, r.david.murray, jhalcrow
2010-09-21 11:41:35mrabarnettsetmessageid: <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
2010-09-21 11:41:33mrabarnettlinkissue2636 messages
2010-09-21 11:41:33mrabarnettcreate