Message260078
Serhiy: Removing the shortcut would slow down the tokenizer a lot since UTF-8 encoded source code is the norm, not the exception.
The "problem" here is that the tokenizer trusts the source code in being in the correct encoding when you use one of utf-8 or iso-8859-1 and then skips the usual "decode into unicode, then encode to utf-8" step.
From a purist point of view, you are right, Python should always pass through those steps to detect encoding errors, but from a practical point of view, I think the optimization is fine. |
|
Date |
User |
Action |
Args |
2016-02-11 08:16:29 | lemburg | set | recipients:
+ lemburg, doerwalter, terry.reedy, vstinner, Jim.Jewett, serhiy.storchaka, 王杰 |
2016-02-11 08:16:29 | lemburg | set | messageid: <1455178589.31.0.0599435094671.issue25937@psf.upfronthosting.co.za> |
2016-02-11 08:16:29 | lemburg | link | issue25937 messages |
2016-02-11 08:16:28 | lemburg | create | |
|