Message124173
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex. (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed. At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped. A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character. See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java.
I'll try to come up with a patch. |
|
Date |
User |
Action |
Args |
2010-12-17 01:34:52 | belopolsky | set | recipients:
+ belopolsky, lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw |
2010-12-17 01:34:52 | belopolsky | set | messageid: <1292549692.05.0.442901398789.issue10254@psf.upfronthosting.co.za> |
2010-12-17 01:34:49 | belopolsky | link | issue10254 messages |
2010-12-17 01:34:49 | belopolsky | create | |
|