Message302300
I looked at the Gutenburg samples. The first has a short intro with some English, then is pure Greek. The patch is clearly good for anyone using mostly a single block alphabetic language.
The second is Chinese, not hieroglyphs (ancient Egyptian). A slowdown for ancient Egyptian is irrelevant; a slowdown for Chinese is undesirable. Japanese mostly uses about 2000 Chinese chars, the Chinses more. Even if the common chars are grouped together (I don't know), there are at least 10 possible chars for each 2-char slot. So I am not surprised at a net slowdown. I would also not be surprised if Japanese fared worse, as it uses at least 2 blocks for its kana and uses many latin chars.
Unless we go beyond 2 x 256 slots, caching CJK is hopeless. Have you considered limiting the caching to the blocks before the CJK blocks, up to, say, U+31BF? https://en.wikipedia.org/wiki/Unicode_block. Both Japanese and Korean might then see an actual speedup. |
|
Date |
User |
Action |
Args |
2017-09-15 20:53:37 | terry.reedy | set | recipients:
+ terry.reedy, lemburg, vstinner, benjamin.peterson, ezio.melotti, serhiy.storchaka |
2017-09-15 20:53:37 | terry.reedy | set | messageid: <1505508817.11.0.544062520532.issue31484@psf.upfronthosting.co.za> |
2017-09-15 20:53:37 | terry.reedy | link | issue31484 messages |
2017-09-15 20:53:37 | terry.reedy | create | |
|