This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mrabarnett
Recipients akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date 2010-09-21.11:41:33
SpamBayes Score 0.00012060844
Marked as misclassified No
Message-id <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
In-reply-to
Content
I use Python 3, where len("\U00010337") == 2 on a narrow build.

Yes, wide Unicode on a narrow build is a problem:

>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]

I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.

I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.
History
Date User Action Args
2010-09-21 11:41:36mrabarnettsetrecipients: + mrabarnett, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, r.david.murray, jhalcrow
2010-09-21 11:41:35mrabarnettsetmessageid: <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
2010-09-21 11:41:33mrabarnettlinkissue2636 messages
2010-09-21 11:41:33mrabarnettcreate