Issue 22221: ast.literal_eval confused by coding declarations

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/66417

classification

Title:	ast.literal_eval confused by coding declarations
Type:	behavior	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 2.7

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	serhiy.storchaka	Nosy List:	davidhalter, jorgenschaefer, python-dev, serhiy.storchaka, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2014-08-17 19:53 by jorgenschaefer, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
source_encoding_second_line-2.7.patch	serhiy.storchaka, 2014-08-23 07:44		review

Messages (8)
msg225464 - (view)	Author: Jorgen Schäfer (jorgenschaefer)	Date: 2014-08-17 19:53
The ast module seems to get confused for certain strings which contain coding declarations. >>> import ast >>> s = u'"""\\\n# -- coding: utf-8 --\n"""' >>> print s """\ # -- coding: utf-8 -- """ >>> ast.literal_eval(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 49, in literal_eval node_or_string = parse(node_or_string, mode='eval') File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 37, in parse return compile(source, filename, mode, PyCF_ONLY_AST) File "<unknown>", line 0 SyntaxError: encoding declaration in Unicode string
msg225469 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-08-17 20:31
eval() is affected too. 3.x isn't affected.
msg225701 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-08-22 19:59
This issue is about the SyntaxError message for eval functions, not the ast module per se. My first response is that the reported message is not a bug and that this issue should be closed as 'not a bug'. (General reason) Trying to eval an expression preceded by a comment on its own line or followed by a comment works. >>> eval("#before\n'string'#after") 'string' Trying to eval a bare comment is a syntax error. >>> eval("#comment\n") ... SyntaxError: unexpected EOF while parsing So the issue as presented is the special-case message. However, messages are not part of the language specification and improving them is often/usually/always? treated as an enhancement. Changing them will break code and tests that depend on the exact wording. 2.7 does not get enhancements. (Specific reason) In 2.x, the input to (literal-)eval is either latin-1 encoded bytes or unicode. 'Latin-1' input could potentially consist of an encoding declaration on one line followed on the next line by a literal string encoded as indicated. >>> le("# -- coding: utf-8 --\n'string'") 'string' Unicode input, the subject of this issue, is encoded to latin-1, which means that any literal string in the expression has to be latin-1 encoded. Therefore, a latin-1 encoding declaration is redundant and anything else is either redundant (if the original unicode only contains characters that encode the same in latin-1, as in the example above) or wrong, with hard to predict behavior. Someone thought it worthwhile to add the special case check. I think it should be left as is. Jorgen, please either close this or explain why you think not, in light of the above.
msg225704 - (view)	Author: Jorgen Schäfer (jorgenschaefer)	Date: 2014-08-22 20:27
I do not understand how your comments apply to this bug. There is no comment anywhere. There is a single string literal whose contents look like a comment. The expression parses correctly without syntax error if you add a few newlines in front. Could you clarify your objection? On Aug 22, 2014 9:59 PM, "Terry J. Reedy" <report@bugs.python.org> wrote: > > Terry J. Reedy added the comment: > > This issue is about the SyntaxError message for eval functions, not the > ast module per se. My first response is that the reported message is not a > bug and that this issue should be closed as 'not a bug'. > > (General reason) Trying to eval an expression preceded by a comment on its > own line or followed by a comment works. > > >>> eval("#before\n'string'#after") > 'string' > > Trying to eval a bare comment is a syntax error. > > >>> eval("#comment\n") > ... > SyntaxError: unexpected EOF while parsing > > So the issue as presented is the special-case message. However, messages > are not part of the language specification and improving them is > often/usually/always? treated as an enhancement. Changing them will break > code and tests that depend on the exact wording. 2.7 does not get > enhancements. > > (Specific reason) In 2.x, the input to (literal-)eval is either latin-1 > encoded bytes or unicode. 'Latin-1' input could potentially consist of an > encoding declaration on one line followed on the next line by a literal > string encoded as indicated. > > >>> le("# -- coding: utf-8 --\n'string'") > 'string' > > Unicode input, the subject of this issue, is encoded to latin-1, which > means that any literal string in the expression has to be latin-1 encoded. > Therefore, a latin-1 encoding declaration is redundant and anything else is > either redundant (if the original unicode only contains characters that > encode the same in latin-1, as in the example above) or wrong, with hard to > predict behavior. Someone thought it worthwhile to add the special case > check. I think it should be left as is. > > Jorgen, please either close this or explain why you think not, in light of > the above. > > ---------- > nosy: +terry.reedy > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue22221> > _______________________________________ >
msg225716 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2014-08-22 22:48
[When responding, please do not quote more than a line or two. If responding by email, please delete the rest. Otherwise, the result is extra noise when viewing online.] You are right, I missed the outer 's, though my examples are not completely irrelevant. Eval looks inside the inner quotes for a coding line in certain circumstances, or maybe it always looks and we do not notice when there is not problem. Here are some of my results on US Win 7, cp1252, 3.4.1, interactive prompt, idle pass: eval(u'"""# -- coding: utf-8 --\na"""') fail: eval(u'"""\n# -- coding: utf-8 --\na"""') since coding can be on line 1 or 2, these should be same pass: eval(u'"""\n\n# -- coding: utf-8 --\na"""') coding on 3rd line should be ignored fail: eval(u'"""\\\n# -- coding: utf-8 --\na"""') logically, this matches the first example; physically, the second pass: eval(u'"""# -- coding: utf-8 --\na€"""') but € prints as \xc2\x80', its utf-8 encoding as pasted in From file, saved from Idle editor as cp1252 pass: print(eval("# -- coding: utf-8 --\n'euro€'")) no u prefix, € prints as € fail: print(eval(u"# -- coding: utf-8 --\n'euro€'")) Save the following two lines in one file as utf-8 pass: print(eval("# -- coding: utf-8 --\n'euro€'")) print(eval(u"# -- coding: utf-8 --\n'euro∢'")) but € & ∢ print as 'â‚¬' & 'âˆ¢' adding # -- coding: utf-8 -- line makes no difference adding u prefix fails either way
msg225735 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-08-23 07:44
This is the same issue as issue18960. Here is backported patch with additional test.
msg226404 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-09-05 07:26
New changeset dd1e21f17b1c by Serhiy Storchaka in branch '2.7': Issue #22221: Backported fixes from Python 3 (issue #18960). http://hg.python.org/cpython/rev/dd1e21f17b1c
msg226407 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-09-05 08:11
New changeset 13cd8ea4cafe by Serhiy Storchaka in branch '3.4': Issue #22221: Add tests for compile() with source encoding cookie. http://hg.python.org/cpython/rev/13cd8ea4cafe New changeset 9d335a54d728 by Serhiy Storchaka in branch 'default': Issue #22221: Add tests for compile() with source encoding cookie. http://hg.python.org/cpython/rev/9d335a54d728

History
Date	User	Action	Args
2022-04-11 14:58:07	admin	set	github: 66417
2014-09-05 08:28:34	serhiy.storchaka	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2014-09-05 08:11:59	python-dev	set	messages: + msg226407
2014-09-05 07:26:24	python-dev	set	nosy: + python-dev messages: + msg226404
2014-08-23 07:44:39	serhiy.storchaka	set	files: + source_encoding_second_line-2.7.patch keywords: + patch messages: + msg225735 stage: needs patch -> patch review
2014-08-22 22:48:40	terry.reedy	set	messages: + msg225716
2014-08-22 20:27:04	jorgenschaefer	set	messages: + msg225704
2014-08-22 19:59:06	terry.reedy	set	nosy: + terry.reedy messages: + msg225701
2014-08-17 20:31:57	serhiy.storchaka	set	assignee: serhiy.storchaka type: behavior components: + Interpreter Core, - Library (Lib) nosy: + serhiy.storchaka messages: + msg225469 stage: needs patch
2014-08-17 20:11:27	davidhalter	set	nosy: + davidhalter
2014-08-17 19:53:49	jorgenschaefer	create