Message 253114 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	Brian.Cain, terry.reedy
Date	2015-10-17.02:36:07
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1445049368.55.0.412810876019.issue25388@psf.upfronthosting.co.za>
In-reply-to

Content
According to https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error is (should be) reported as a SyntaxError. Since b"\x7f\x00\x00\n''s\x01\xfd\n'S" is not invalid as utf-8, I expect a UnicodeDecodeError converted to SyntaxError. * compile(bytes, filename, mode) defaults to latin1 instead. It has no decoding problem, but quits with "ValueError: source code string cannot contain null bytes". On 2.7, I might expect that as the error. I expect '''self.assertIn(b"Non-UTF-8", res.err)''' to always fail because error messages are strings, not bytes. That aside, have you ever seen that particular text (as a string) in a SyntaxError message?). Why do you think the crash is during the tokenizing phase? I could not see anything in the AS report.

According to https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error is (should be) reported as a SyntaxError. Since b"\x7f\x00\x00\n''s\x01\xfd\n'S" is not invalid as utf-8, I expect a UnicodeDecodeError converted to SyntaxError.

* compile(bytes, filename, mode) defaults to latin1 instead.  It has no decoding problem, but quits with "ValueError: source code string cannot contain null bytes".  On 2.7, I might expect that as the error.

I expect '''self.assertIn(b"Non-UTF-8", res.err)''' to always fail because error messages are strings, not bytes.  That aside, have you ever seen that particular text (as a string) in a SyntaxError message?).

Why do you think the crash is during the tokenizing phase?  I could not see anything in the AS report.

History
Date	User	Action	Args
2015-10-17 02:36:08	terry.reedy	set	recipients: + terry.reedy, Brian.Cain
2015-10-17 02:36:08	terry.reedy	set	messageid: <1445049368.55.0.412810876019.issue25388@psf.upfronthosting.co.za>
2015-10-17 02:36:08	terry.reedy	link	issue25388 messages
2015-10-17 02:36:07	terry.reedy	create