Message257074
On 27.12.2015 02:05, Serhiy Storchaka wrote:
>
>> I wonder why this does not trigger the exception.
>
> Because in case of utf-8 and iso-8859-1 decoding and encoding steps are omitted.
>
> In general case the input is decoded from specified encoding and than encoded to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets the raw data.
Right, but since the tokenizer doesn't know about "utf8" it
should reach out to the codec registry to get a properly encoded
version of the source code (even though this is an unnecessary
round-trip).
There are few other aliases for UTF-8 which would likely trigger
the same problem:
# utf_8 codec
'u8' : 'utf_8',
'utf' : 'utf_8',
'utf8' : 'utf_8',
'utf8_ucs2' : 'utf_8',
'utf8_ucs4' : 'utf_8', |
|
Date |
User |
Action |
Args |
2015-12-27 12:33:05 | lemburg | set | recipients:
+ lemburg, doerwalter, terry.reedy, vstinner, serhiy.storchaka, 王杰 |
2015-12-27 12:33:05 | lemburg | link | issue25937 messages |
2015-12-27 12:33:05 | lemburg | create | |
|