Message 302304 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	benjamin.peterson, ezio.melotti, lemburg, serhiy.storchaka, terry.reedy, vstinner
Date	2017-09-15.23:05:27
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1505516727.48.0.200440977744.issue31484@psf.upfronthosting.co.za>
In-reply-to

Content
The Greek sample includes 155 unique characters (including whitespace, punctuation, and the english characters at the beginning), so they can all fit in the cache. The Chinese sample however includes 3695 unique characters (all within the BMP), probably causing a lot more misses in the cache and a slowdown caused by the overhead. The Chinese text you used for the test is also from some 700 years ago, and uses traditional and vernacular Chinese, so the number of unique character is higher than what you would normally encounter in modern Chinese.

The Greek sample includes 155 unique characters (including whitespace, punctuation, and the english characters at the beginning), so they can all fit in the cache.
The Chinese sample however includes 3695 unique characters (all within the BMP), probably causing a lot more misses in the cache and a slowdown caused by the overhead.
The Chinese text you used for the test is also from some 700 years ago, and uses traditional and vernacular Chinese, so the number of unique character is higher than what you would normally encounter in modern Chinese.

History
Date	User	Action	Args
2017-09-15 23:05:27	ezio.melotti	set	recipients: + ezio.melotti, lemburg, terry.reedy, vstinner, benjamin.peterson, serhiy.storchaka
2017-09-15 23:05:27	ezio.melotti	set	messageid: <1505516727.48.0.200440977744.issue31484@psf.upfronthosting.co.za>
2017-09-15 23:05:27	ezio.melotti	link	issue31484 messages
2017-09-15 23:05:27	ezio.melotti	create