This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mrabarnett
Recipients David MacIver, mrabarnett, tomviner
Date 2017-08-14.17:57:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1502733456.99.0.422932492964.issue31193@psf.upfronthosting.co.za>
In-reply-to
Content
The re module works with codepoints, it doesn't understand canonical equivalence.

For example, it doesn't recognise that "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}" is equivalent to "\N{LATIN CAPITAL LETTER E WITH ACUTE}".

This is true for Python in general, except for identifiers, which are normalised:

>>> "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}"
'É'
>>> É = 0
>>> "\N{LATIN CAPITAL LETTER E WITH ACUTE}"
'É'
>>> É
0

This also means that, say '.' will match only 1 _codepoint_.
History
Date User Action Args
2017-08-14 17:57:37mrabarnettsetrecipients: + mrabarnett, David MacIver, tomviner
2017-08-14 17:57:36mrabarnettsetmessageid: <1502733456.99.0.422932492964.issue31193@psf.upfronthosting.co.za>
2017-08-14 17:57:36mrabarnettlinkissue31193 messages
2017-08-14 17:57:36mrabarnettcreate