Author steven.daprano
Recipients Guillaume Sanchez, steven.daprano
Date 2017-06-21.01:34:07
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1498008848.16.0.236080187603.issue30717@psf.upfronthosting.co.za>
In-reply-to
Content
http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries

talks about *grapheme clusters*, not "graphemes" alone, and it seems clear to me that they are language dependent. For example, it says:

The Unicode Standard provides default algorithms for determining grapheme cluster boundaries, with two variants: legacy grapheme clusters and extended grapheme clusters. The most appropriate variant depends on the language and operation involved. ... These algorithms can be adapted to produce tailored grapheme clusters for specific locales...


Nevertheless, even just a basic API to either the *legacy grapheme cluster* or the *extended grapheme cluster* algorithms would be a good start.

Can I suggest that the unicodedata module might be the right place for it?

And thank you for volunteering to do the work on this!
History
Date User Action Args
2017-06-21 01:34:08steven.dapranosetrecipients: + steven.daprano, Guillaume Sanchez
2017-06-21 01:34:08steven.dapranosetmessageid: <1498008848.16.0.236080187603.issue30717@psf.upfronthosting.co.za>
2017-06-21 01:34:08steven.dapranolinkissue30717 messages
2017-06-21 01:34:07steven.dapranocreate