Message202331
Python parser (Parser/tokenizer.c) has a translate_into_utf8() function to decode a string from the input encoding and encode it to UTF-8.
This function is unnecessary if the input string is already encoded to UTF-8, which is something common nowadays. Linux, Mac OS X and many other operating systems are now using UTF-8 as the default locale encoding, UTF-8 is the default encoding for Python scripts, etc. compile(), eval() and exec() functions pass UTF-8 encoded strings to the parser.
Attached patch adds an input_is_utf8 flag to the tokenizer to skip translate_into_utf8() if the input string is already encoded to UTF-8. |
|
Date |
User |
Action |
Args |
2013-11-07 12:40:55 | vstinner | set | recipients:
+ vstinner, benjamin.peterson, serhiy.storchaka |
2013-11-07 12:40:55 | vstinner | set | messageid: <1383828055.35.0.303135915018.issue19519@psf.upfronthosting.co.za> |
2013-11-07 12:40:55 | vstinner | link | issue19519 messages |
2013-11-07 12:40:55 | vstinner | create | |
|