Issue 27878: Unicode word boundries

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/72065

classification

process

Status:	closed	Resolution:	not a bug
Dependencies:		Superseder:
Assigned To:		Nosy List:	SilentGhost, ezio.melotti, mrabarnett, revo
Priority:	normal	Keywords:

Created on 2016-08-27 14:36 by revo, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (2)
msg273782 - (view)	Author: mohammad (revo)	Date: 2016-08-27 14:36
According to [UAX #29](http://unicode.org/reports/tr29) - unicode word boundaries (rule WB5a), an apostrophe includes U+0027 ( ' ) APOSTROPHE and U+2019 ( ’ ) RIGHT SINGLE QUOTATION MARK (curly apostrophe). However regex module only implements U+0027 and the second kind (U+2019) is missing: /* Break between apostrophe and vowels (French, Italian). / / WB5a */ if (pos_m1 >= 0 && char_at(state->text, pos_m1) == '\'' && is_unicode_vowel(char_at(state->text, text_pos))) return TRUE; [Source code](https://bitbucket.org/mrabarnett/mrab-regex/src/f21447bf288780d8dd9b1633820480484ce8f677/regex_3/regex/_regex.c?at=default&fileviewer=file-view-default#_regex.c-1657)
msg273783 - (view)	Author: SilentGhost (SilentGhost) *	Date: 2016-08-27 14:56
regex module is not in standard library, on the latest 3.6 branch re module breaks on curly apostrophe just fine. Perhaps, try reporting this issue on the bitbucket tracker?

History
Date	User	Action	Args
2022-04-11 14:58:35	admin	set	github: 72065
2016-08-27 14:56:48	SilentGhost	set	status: open -> closed versions: - Python 2.7, Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6 nosy: + SilentGhost messages: + msg273783 resolution: not a bug stage: resolved
2016-08-27 14:36:09	revo	create