This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author terry.reedy
Recipients lemburg, nathanlmiles, rsc, terry.reedy, timehorse
Date 2008-11-28.21:14:53
SpamBayes Score 3.44359e-06
Marked as misclassified No
Message-id <1227906896.37.0.936030582141.issue1693050@psf.upfronthosting.co.za>
In-reply-to
Content
Vowel 'marks' are condensed vowel characters and are very much part of
words and do not separate words.  Python3 properly includes Mn and Mc as
identifier characters.

http://docs.python.org/dev/3.0/reference/lexical_analysis.html#identifiers-and-keywords

For instance, the word 'hindi' has 3 consonants 'h', 'n', 'd', 2 vowels
'i' and 'ii' (long i) following 'h' and 'd', and a null vowel (virama)
after 'n'. [The null vowel is needed because no vowel mark indicates the
default vowel short a.  So without it, the word would be hinadii.]
The difference between the devanagari vowel characters, used at the
beginning of words, and the vowel marks, used thereafter, is purely
graphical and not phonological.  In short, in the sanskrit family,
word = syllable+
syllable = vowel | consonant + vowel mark

From a clp post asking why re does not see hindi as a word:

हिन्दी
     ह DEVANAGARI LETTER HA (Lo)
     ि DEVANAGARI VOWEL SIGN I (Mc)
     न DEVANAGARI LETTER NA (Lo)
     ् DEVANAGARI SIGN VIRAMA (Mn)
     द DEVANAGARI LETTER DA (Lo)
     ी DEVANAGARI VOWEL SIGN II (Mc)

.isapha and possibly other unicode methods need fixing also
>>> 'हिन्दी'.isalpha()#2.x and 3.0
False
History
Date User Action Args
2008-11-28 21:14:56terry.reedysetrecipients: + terry.reedy, lemburg, nathanlmiles, rsc, timehorse
2008-11-28 21:14:56terry.reedysetmessageid: <1227906896.37.0.936030582141.issue1693050@psf.upfronthosting.co.za>
2008-11-28 21:14:55terry.reedylinkissue1693050 messages
2008-11-28 21:14:54terry.reedycreate