Message300257
The re module works with codepoints, it doesn't understand canonical equivalence.
For example, it doesn't recognise that "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}" is equivalent to "\N{LATIN CAPITAL LETTER E WITH ACUTE}".
This is true for Python in general, except for identifiers, which are normalised:
>>> "\N{LATIN CAPITAL LETTER E}\N{COMBINING ACUTE ACCENT}"
'É'
>>> É = 0
>>> "\N{LATIN CAPITAL LETTER E WITH ACUTE}"
'É'
>>> É
0
This also means that, say '.' will match only 1 _codepoint_. |
|
Date |
User |
Action |
Args |
2017-08-14 17:57:37 | mrabarnett | set | recipients:
+ mrabarnett, David MacIver, tomviner |
2017-08-14 17:57:36 | mrabarnett | set | messageid: <1502733456.99.0.422932492964.issue31193@psf.upfronthosting.co.za> |
2017-08-14 17:57:36 | mrabarnett | link | issue31193 messages |
2017-08-14 17:57:36 | mrabarnett | create | |
|