Message 302371 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	xiang.zhang
Recipients	benjamin.peterson, ezio.melotti, lemburg, methane, serhiy.storchaka, terry.reedy, vstinner, xiang.zhang
Date	2017-09-17.16:10:50
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1505664650.61.0.461555400098.issue31484@psf.upfronthosting.co.za>
In-reply-to

Content
I run the patch against a toy NLP application, cutting words from Shui Hu Zhuan provided by Serhiy. The result is not bad, 6% faster. And I also count the hit rate, 90% hit cell 0， 4.5 hit cell 1, 5.5% miss. I also increase the cache size to 1024 * 2. Although the hit rate increases to 95.4%, 2.1%, 2.4%, it's still 6% difference. So IMHO this patch could hardly affect that much real-world applications, better or worse. I couldn't recall clearly the implementation of unicode but why can't we reuse the latin1 cache when we use this bmp cache? And then to avoid the chars' low bits conflicting with ASCII chars' low bits we have to introduce the mini-LRU-cache, which is not that easily understandable.

I run the patch against a toy NLP application, cutting words from Shui Hu Zhuan provided by Serhiy. The result is not bad, 6% faster. And I also count the hit rate, 90% hit cell 0， 4.5 hit cell 1, 5.5% miss. I also increase the cache size to 1024 * 2. Although the hit rate increases to 95.4%, 2.1%, 2.4%, it's still 6% difference.

So IMHO this patch could hardly affect that *much* real-world applications, better or worse. I couldn't recall clearly the implementation of unicode but why can't we reuse the latin1 cache when we use this bmp cache? And then to avoid the chars' low bits conflicting with ASCII chars' low bits we have to introduce the mini-LRU-cache, which is not that easily understandable.

History
Date	User	Action	Args
2017-09-17 16:10:50	xiang.zhang	set	recipients: + xiang.zhang, lemburg, terry.reedy, vstinner, benjamin.peterson, ezio.melotti, methane, serhiy.storchaka
2017-09-17 16:10:50	xiang.zhang	set	messageid: <1505664650.61.0.461555400098.issue31484@psf.upfronthosting.co.za>
2017-09-17 16:10:50	xiang.zhang	link	issue31484 messages
2017-09-17 16:10:50	xiang.zhang	create