Message 202340 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	benjamin.peterson, serhiy.storchaka, vstinner
Date	2013-11-07.14:03:10
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CAMpsgwaTnjF-okm2twm7qyMMZpFgnvcXLk8XP2jNdqSMKaQaUQ@mail.gmail.com>
In-reply-to	<1383832111.38.0.889875651063.issue19519@psf.upfronthosting.co.za>

Content
> The parser should check that the input is actually valid UTF-8 data. Ah yes, correct. It looks like input data is still checked for valid UTF-8 data. I suppose that the byte strings should be decoded from UTF-8 because Python 3 manipulates Unicode strings, not byte strings. The patch only skips calls to translate_into_utf8(str, tok->encoding), calls to translate_into_utf8(str, tok->enc) are unchanged (notice: encoding != enc :-)). But it looks like translate_into_utf8(str, tok->enc) is not called if tok->enc is NULL. If tok->encoding is "utf-8" and tok->enc is NULL, maybe the input string is not decoded from UTF-8. But it sounds strange, because Python uses Unicode strings. Don't trust me, I would prefer an explanation of Benjamin who knows better than me the parser internals :-)

> The parser should check that the input is actually valid UTF-8 data.

Ah yes, correct. It looks like input data is still checked for valid
UTF-8 data. I suppose that the byte strings should be decoded from
UTF-8 because Python 3 manipulates Unicode strings, not byte strings.

The patch only skips calls to translate_into_utf8(str, tok->encoding),
calls to translate_into_utf8(str, tok->enc) are unchanged (notice:
encoding != enc :-)).

But it looks like translate_into_utf8(str, tok->enc) is not called if
tok->enc is NULL.

If tok->encoding is "utf-8" and tok->enc is NULL, maybe the input
string is not decoded from UTF-8. But it sounds strange, because
Python uses Unicode strings.

Don't trust me, I would prefer an explanation of Benjamin who knows
better than me the parser internals :-)

History
Date	User	Action	Args
2013-11-07 14:03:11	vstinner	set	recipients: + vstinner, benjamin.peterson, serhiy.storchaka
2013-11-07 14:03:11	vstinner	link	issue19519 messages
2013-11-07 14:03:10	vstinner	create