This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ezio.melotti
Recipients benjamin.peterson, ezio.melotti, lemburg, serhiy.storchaka, terry.reedy, vstinner
Date 2017-09-15.23:05:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
The Greek sample includes 155 unique characters (including whitespace, punctuation, and the english characters at the beginning), so they can all fit in the cache.
The Chinese sample however includes 3695 unique characters (all within the BMP), probably causing a lot more misses in the cache and a slowdown caused by the overhead.
The Chinese text you used for the test is also from some 700 years ago, and uses traditional and vernacular Chinese, so the number of unique character is higher than what you would normally encounter in modern Chinese.
Date User Action Args
2017-09-15 23:05:27ezio.melottisetrecipients: + ezio.melotti, lemburg, terry.reedy, vstinner, benjamin.peterson, serhiy.storchaka
2017-09-15 23:05:27ezio.melottisetmessageid: <>
2017-09-15 23:05:27ezio.melottilinkissue31484 messages
2017-09-15 23:05:27ezio.melotticreate