Message298322
Hello,
I come from bugs.python.org/issue30717 . I have a pending PR that needs review ( https://github.com/python/cpython/pull/2673 ) adding a function that breaks unicode strings into grapheme clusters (aka what one would intuitively call "a character"). It's based on the grapheme cluster breaking algorithm from TR29.
Let me know if this is of any relevance.
Quick demo:
>>> a=unicodedata.break_graphemes("lol")
>>> list(a)
['l', 'o', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309l"))
['l', 'ỏ', 'l']
>>> list(unicodedata.break_graphemes("lo\u0309\u0301l"))
['l', 'ỏ́', 'l']
>>> list(unicodedata.break_graphemes("lo\u0301l"))
['l', 'ó', 'l']
>>> list(unicodedata.break_graphemes(""))
[] |
|
Date |
User |
Action |
Args |
2017-07-13 23:47:06 | Guillaume Sanchez | set | recipients:
+ Guillaume Sanchez, lemburg, loewis, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, eric.araujo, Arfrever, r.david.murray, inigoserna, zeha, poq, Nicholas.Cole, tchrist, serhiy.storchaka |
2017-07-13 23:47:06 | Guillaume Sanchez | set | messageid: <1499989626.46.0.535744055477.issue12568@psf.upfronthosting.co.za> |
2017-07-13 23:47:06 | Guillaume Sanchez | link | issue12568 messages |
2017-07-13 23:47:06 | Guillaume Sanchez | create | |
|