Message76556
Vowel 'marks' are condensed vowel characters and are very much part of
words and do not separate words. Python3 properly includes Mn and Mc as
identifier characters.
http://docs.python.org/dev/3.0/reference/lexical_analysis.html#identifiers-and-keywords
For instance, the word 'hindi' has 3 consonants 'h', 'n', 'd', 2 vowels
'i' and 'ii' (long i) following 'h' and 'd', and a null vowel (virama)
after 'n'. [The null vowel is needed because no vowel mark indicates the
default vowel short a. So without it, the word would be hinadii.]
The difference between the devanagari vowel characters, used at the
beginning of words, and the vowel marks, used thereafter, is purely
graphical and not phonological. In short, in the sanskrit family,
word = syllable+
syllable = vowel | consonant + vowel mark
From a clp post asking why re does not see hindi as a word:
हिन्दी
ह DEVANAGARI LETTER HA (Lo)
ि DEVANAGARI VOWEL SIGN I (Mc)
न DEVANAGARI LETTER NA (Lo)
् DEVANAGARI SIGN VIRAMA (Mn)
द DEVANAGARI LETTER DA (Lo)
ी DEVANAGARI VOWEL SIGN II (Mc)
.isapha and possibly other unicode methods need fixing also
>>> 'हिन्दी'.isalpha()#2.x and 3.0
False |
|
Date |
User |
Action |
Args |
2008-11-28 21:14:56 | terry.reedy | set | recipients:
+ terry.reedy, lemburg, nathanlmiles, rsc, timehorse |
2008-11-28 21:14:56 | terry.reedy | set | messageid: <1227906896.37.0.936030582141.issue1693050@psf.upfronthosting.co.za> |
2008-11-28 21:14:55 | terry.reedy | link | issue1693050 messages |
2008-11-28 21:14:54 | terry.reedy | create | |
|