This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author revo
Recipients ezio.melotti, mrabarnett, revo
Date 2016-08-27.14:36:09
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1472308569.99.0.0993827433917.issue27878@psf.upfronthosting.co.za>
In-reply-to
Content
According to [UAX #29](http://unicode.org/reports/tr29) - unicode word boundaries (rule WB5a), an apostrophe includes U+0027 ( ' ) APOSTROPHE and U+2019 ( ’ ) RIGHT SINGLE QUOTATION MARK (curly apostrophe).

However regex module only implements U+0027 and the second kind (U+2019) is missing:

/* Break between apostrophe and vowels (French, Italian). */
/* WB5a */
if (pos_m1 >= 0 && char_at(state->text, pos_m1) == '\'' &&
    is_unicode_vowel(char_at(state->text, text_pos)))
        return TRUE;


[Source code](https://bitbucket.org/mrabarnett/mrab-regex/src/f21447bf288780d8dd9b1633820480484ce8f677/regex_3/regex/_regex.c?at=default&fileviewer=file-view-default#_regex.c-1657)
History
Date User Action Args
2016-08-27 14:36:10revosetrecipients: + revo, ezio.melotti, mrabarnett
2016-08-27 14:36:09revosetmessageid: <1472308569.99.0.0993827433917.issue27878@psf.upfronthosting.co.za>
2016-08-27 14:36:09revolinkissue27878 messages
2016-08-27 14:36:09revocreate