Author lemburg
Recipients doerwalter, lemburg, serhiy.storchaka, terry.reedy, vstinner, 王杰
Date 2015-12-27.12:33:05
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <567FDA7E.5020405@egenix.com>
In-reply-to <1451178315.58.0.0417010168097.issue25937@psf.upfronthosting.co.za>
Content
On 27.12.2015 02:05, Serhiy Storchaka wrote:
> 
>> I wonder why this does not trigger the exception.
> 
> Because in case of utf-8 and iso-8859-1 decoding and encoding steps are omitted.
>
> In general case the input is decoded from specified encoding and than encoded to UTF-8 for parser. But for utf-8 and iso-8859-1 encodings the parser gets the raw data.

Right, but since the tokenizer doesn't know about "utf8" it
should reach out to the codec registry to get a properly encoded
version of the source code (even though this is an unnecessary
round-trip).

There are few other aliases for UTF-8 which would likely trigger
the same problem:

    # utf_8 codec
    'u8'                 : 'utf_8',
    'utf'                : 'utf_8',
    'utf8'               : 'utf_8',
    'utf8_ucs2'          : 'utf_8',
    'utf8_ucs4'          : 'utf_8',
History
Date User Action Args
2015-12-27 12:33:05lemburgsetrecipients: + lemburg, doerwalter, terry.reedy, vstinner, serhiy.storchaka, 王杰
2015-12-27 12:33:05lemburglinkissue25937 messages
2015-12-27 12:33:05lemburgcreate