Message 298322 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Guillaume Sanchez
Recipients	Arfrever, Guillaume Sanchez, Nicholas.Cole, benjamin.peterson, eric.araujo, ezio.melotti, inigoserna, lemburg, loewis, poq, r.david.murray, serhiy.storchaka, tchrist, terry.reedy, vstinner, zeha
Date	2017-07-13.23:47:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za>
In-reply-to

Content
Hello, I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a character"). It's based on the grapheme cluster breaking algorithm from TR29. Let me know if this is of any relevance. Quick demo: >>> a=unicodedata.break_graphemes("lol") >>> list(a) ['l', 'o', 'l'] >>> list(unicodedata.break_graphemes("lo\u0309l")) ['l', 'ỏ', 'l'] >>> list(unicodedata.break_graphemes("lo\u0309\u0301l")) ['l', 'ỏ́', 'l'] >>> list(unicodedata.break_graphemes("lo\u0301l")) ['l', 'ó', 'l'] >>> list(unicodedata.break_graphemes("")) []

Hello,

I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a character"). It's based on the grapheme cluster breaking algorithm from TR29.

Let me know if this is of any relevance.

Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[]

History
Date	User	Action	Args
2017-07-13 23:47:06	Guillaume Sanchez	set	recipients: + Guillaume Sanchez, lemburg, loewis, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, Arfrever, r.david.murray, inigoserna, zeha, poq, Nicholas.Cole, tchrist, serhiy.storchaka
2017-07-13 23:47:06	Guillaume Sanchez	set	messageid: <1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za>
2017-07-13 23:47:06	Guillaume Sanchez	link	issue12568 messages
2017-07-13 23:47:06	Guillaume Sanchez	create