Message 76556 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	lemburg, nathanlmiles, rsc, terry.reedy, timehorse
Date	2008-11-28.21:14:53
SpamBayes Score	3.44359e-06
Marked as misclassified	No
Message-id	<1227906896.37.0.936030582141.issue1693050@psf.upfronthosting.co.za>
In-reply-to

Content
Vowel 'marks' are condensed vowel characters and are very much part of words and do not separate words. Python3 properly includes Mn and Mc as identifier characters. http://docs.python.org/dev/3.0/reference/lexical_analysis.html#identifiers-and-keywords For instance, the word 'hindi' has 3 consonants 'h', 'n', 'd', 2 vowels 'i' and 'ii' (long i) following 'h' and 'd', and a null vowel (virama) after 'n'. [The null vowel is needed because no vowel mark indicates the default vowel short a. So without it, the word would be hinadii.] The difference between the devanagari vowel characters, used at the beginning of words, and the vowel marks, used thereafter, is purely graphical and not phonological. In short, in the sanskrit family, word = syllable+ syllable = vowel \| consonant + vowel mark From a clp post asking why re does not see hindi as a word: हिन्दी ह DEVANAGARI LETTER HA (Lo) ि DEVANAGARI VOWEL SIGN I (Mc) न DEVANAGARI LETTER NA (Lo) ् DEVANAGARI SIGN VIRAMA (Mn) द DEVANAGARI LETTER DA (Lo) ी DEVANAGARI VOWEL SIGN II (Mc) .isapha and possibly other unicode methods need fixing also >>> 'हिन्दी'.isalpha()#2.x and 3.0 False

Vowel 'marks' are condensed vowel characters and are very much part of
words and do not separate words.  Python3 properly includes Mn and Mc as
identifier characters.

http://docs.python.org/dev/3.0/reference/lexical_analysis.html#identifiers-and-keywords

For instance, the word 'hindi' has 3 consonants 'h', 'n', 'd', 2 vowels
'i' and 'ii' (long i) following 'h' and 'd', and a null vowel (virama)
after 'n'. [The null vowel is needed because no vowel mark indicates the
default vowel short a.  So without it, the word would be hinadii.]
The difference between the devanagari vowel characters, used at the
beginning of words, and the vowel marks, used thereafter, is purely
graphical and not phonological.  In short, in the sanskrit family,
word = syllable+
syllable = vowel | consonant + vowel mark

From a clp post asking why re does not see hindi as a word:

हिन्दी
     ह DEVANAGARI LETTER HA (Lo)
     ि DEVANAGARI VOWEL SIGN I (Mc)
     न DEVANAGARI LETTER NA (Lo)
     ् DEVANAGARI SIGN VIRAMA (Mn)
     द DEVANAGARI LETTER DA (Lo)
     ी DEVANAGARI VOWEL SIGN II (Mc)

.isapha and possibly other unicode methods need fixing also
>>> 'हिन्दी'.isalpha()#2.x and 3.0
False

History
Date	User	Action	Args
2008-11-28 21:14:56	terry.reedy	set	recipients: + terry.reedy, lemburg, nathanlmiles, rsc, timehorse
2008-11-28 21:14:56	terry.reedy	set	messageid: <1227906896.37.0.936030582141.issue1693050@psf.upfronthosting.co.za>
2008-11-28 21:14:55	terry.reedy	link	issue1693050 messages
2008-11-28 21:14:54	terry.reedy	create