Message117046
I use Python 3, where len("\U00010337") == 2 on a narrow build.
Yes, wide Unicode on a narrow build is a problem:
>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]
I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.
I suppose the moral is that if you want to use wide Unicode then you really should use a wide build. |
|
Date |
User |
Action |
Args |
2010-09-21 11:41:36 | mrabarnett | set | recipients:
+ mrabarnett, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, r.david.murray, jhalcrow |
2010-09-21 11:41:35 | mrabarnett | set | messageid: <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za> |
2010-09-21 11:41:33 | mrabarnett | link | issue2636 messages |
2010-09-21 11:41:33 | mrabarnett | create | |
|