Message 124173 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	Arfrever, barry, belopolsky, ezio.melotti, jhalcrow, lemburg, loewis, pitrou, valhallasw, vstinner
Date	2010-12-17.01:34:49
SpamBayes Score	2.8496983e-08
Marked as misclassified	No
Message-id	<1292549692.05.0.442901398789.issue10254@psf.upfronthosting.co.za>
In-reply-to

Content
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex. (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed. At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped. A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character. See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java. I'll try to come up with a patch.

The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex.  (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed.  At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped.  A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character.  See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java.

I'll try to come up with a patch.

History
Date	User	Action	Args
2010-12-17 01:34:52	belopolsky	set	recipients: + belopolsky, lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-17 01:34:52	belopolsky	set	messageid: <1292549692.05.0.442901398789.issue10254@psf.upfronthosting.co.za>
2010-12-17 01:34:49	belopolsky	link	issue10254 messages
2010-12-17 01:34:49	belopolsky	create