Author lemburg
Recipients doerwalter, lemburg, serhiy.storchaka, terry.reedy, vstinner, 王杰
Date 2015-12-27.12:33:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
In-reply-to <>
On 27.12.2015 02:05, Serhiy Storchaka wrote:
>> I wonder why this does not trigger the exception.
> Because in case of utf-8 and iso-8859-1 decoding and encoding steps are omitted.
> In general case the input is decoded from specified encoding and than encoded to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets the raw data.

Right, but since the tokenizer doesn't know about "utf8" it
should reach out to the codec registry to get a properly encoded
version of the source code (even though this is an unnecessary

There are few other aliases for UTF-8 which would likely trigger
the same problem:

    # utf_8 codec
    'u8'                 : 'utf_8',
    'utf'                : 'utf_8',
    'utf8'               : 'utf_8',
    'utf8_ucs2'          : 'utf_8',
    'utf8_ucs4'          : 'utf_8',
Date User Action Args
2015-12-27 12:33:05lemburgsetrecipients: + lemburg, doerwalter, terry.reedy, vstinner, serhiy.storchaka, 王杰
2015-12-27 12:33:05lemburglinkissue25937 messages
2015-12-27 12:33:05lemburgcreate