Message202340
> The parser should check that the input is actually valid UTF-8 data.
Ah yes, correct. It looks like input data is still checked for valid
UTF-8 data. I suppose that the byte strings should be decoded from
UTF-8 because Python 3 manipulates Unicode strings, not byte strings.
The patch only skips calls to translate_into_utf8(str, tok->encoding),
calls to translate_into_utf8(str, tok->enc) are unchanged (notice:
encoding != enc :-)).
But it looks like translate_into_utf8(str, tok->enc) is not called if
tok->enc is NULL.
If tok->encoding is "utf-8" and tok->enc is NULL, maybe the input
string is not decoded from UTF-8. But it sounds strange, because
Python uses Unicode strings.
Don't trust me, I would prefer an explanation of Benjamin who knows
better than me the parser internals :-) |
|
Date |
User |
Action |
Args |
2013-11-07 14:03:11 | vstinner | set | recipients:
+ vstinner, benjamin.peterson, serhiy.storchaka |
2013-11-07 14:03:11 | vstinner | link | issue19519 messages |
2013-11-07 14:03:10 | vstinner | create | |
|