This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author xiang.zhang
Recipients benjamin.peterson, ezio.melotti, lemburg, methane, serhiy.storchaka, terry.reedy, vstinner, xiang.zhang
Date 2017-09-17.16:10:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
I run the patch against a toy NLP application, cutting words from Shui Hu Zhuan provided by Serhiy. The result is not bad, 6% faster. And I also count the hit rate, 90% hit cell 0, 4.5 hit cell 1, 5.5% miss. I also increase the cache size to 1024 * 2. Although the hit rate increases to 95.4%, 2.1%, 2.4%, it's still 6% difference.

So IMHO this patch could hardly affect that *much* real-world applications, better or worse. I couldn't recall clearly the implementation of unicode but why can't we reuse the latin1 cache when we use this bmp cache? And then to avoid the chars' low bits conflicting with ASCII chars' low bits we have to introduce the mini-LRU-cache, which is not that easily understandable.
Date User Action Args
2017-09-17 16:10:50xiang.zhangsetrecipients: + xiang.zhang, lemburg, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, methane, serhiy.storchaka
2017-09-17 16:10:50xiang.zhangsetmessageid: <>
2017-09-17 16:10:50xiang.zhanglinkissue31484 messages
2017-09-17 16:10:50xiang.zhangcreate