Message 190323 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	timehorse
Recipients	BreamoreBoy, ezio.melotti, l0nwlf, lemburg, loewis, mrabarnett, nathanlmiles, rsc, terry.reedy, timehorse, vstinner
Date	2013-05-29.18:23:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1369851832.22.0.476256805938.issue1693050@psf.upfronthosting.co.za>
In-reply-to

Content
Thanks Matthew and sorry to put you through more work; I just wanted to verify exactly which unicode (UTF-16 I take it) were being used to verify if the UNICODE standard expected them to be treated as unique words or single letters within a word. Sanskrit is an alphabet, not an ideograph so each symbol is considered a letter. So I believe your implementation is correct and yes, you are right, re is at fault. There are just accenting characters and letters in that sequence so they should be interpreted as a single word of 6 letters, as you determine, and not one of the first letter. Mind you, I misinterpreted msg190100 in that I thought you were using findall in which case the answer should be 1, but as far as length of extraction, yes, 6, I totally agree. Sorry for the misunderstanding. http://www.unicode.org/charts/PDF/U0900.pdf contains the code chart for Hindi.

Thanks Matthew and sorry to put you through more work; I just wanted to verify exactly which unicode (UTF-16 I take it) were being used to verify if the UNICODE standard expected them to be treated as unique words or single letters within a word.  Sanskrit is an alphabet, not an ideograph so each symbol is considered a letter.  So I believe your implementation is correct and yes, you are right, re is at fault.  There are just accenting characters and letters in that sequence so they should be interpreted as a single word of 6 letters, as you determine, and not one of the first letter.  Mind you, I misinterpreted msg190100 in that I thought you were using findall in which case the answer should be 1, but as far as length of extraction, yes, 6, I totally agree.  Sorry for the misunderstanding.  http://www.unicode.org/charts/PDF/U0900.pdf contains the code chart for Hindi.

History
Date	User	Action	Args
2013-05-29 18:23:52	timehorse	set	recipients: + timehorse, lemburg, loewis, terry.reedy, vstinner, nathanlmiles, rsc, ezio.melotti, mrabarnett, l0nwlf, BreamoreBoy
2013-05-29 18:23:52	timehorse	set	messageid: <1369851832.22.0.476256805938.issue1693050@psf.upfronthosting.co.za>
2013-05-29 18:23:52	timehorse	link	issue1693050 messages
2013-05-29 18:23:51	timehorse	create