This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author belopolsky
Recipients Arfrever, barry, belopolsky, ezio.melotti, jhalcrow, lemburg, loewis, pitrou, valhallasw, vstinner
Date 2010-12-17.01:34:49
SpamBayes Score 2.8496983e-08
Marked as misclassified No
Message-id <1292549692.05.0.442901398789.issue10254@psf.upfronthosting.co.za>
In-reply-to
Content
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex.  (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed.  At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped.  A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character.  See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java.

I'll try to come up with a patch.
History
Date User Action Args
2010-12-17 01:34:52belopolskysetrecipients: + belopolsky, lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-17 01:34:52belopolskysetmessageid: <1292549692.05.0.442901398789.issue10254@psf.upfronthosting.co.za>
2010-12-17 01:34:49belopolskylinkissue10254 messages
2010-12-17 01:34:49belopolskycreate