classification
Title: ast.literal_eval confused by coding declarations
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: davidhalter, jorgenschaefer, python-dev, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014-08-17 19:53 by jorgenschaefer, last changed 2014-09-05 08:28 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
source_encoding_second_line-2.7.patch serhiy.storchaka, 2014-08-23 07:44 review
Messages (8)
msg225464 - (view) Author: Jorgen Schäfer (jorgenschaefer) Date: 2014-08-17 19:53
The ast module seems to get confused for certain strings which contain coding declarations.

>>> import ast

>>> s = u'"""\\\n# -*- coding: utf-8 -*-\n"""'
>>> print s
"""\
# -*- coding: utf-8 -*-
"""
>>> ast.literal_eval(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/home/forcer/Programs/Python/python2.7/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 0
SyntaxError: encoding declaration in Unicode string
msg225469 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-17 20:31
eval() is affected too. 3.x isn't affected.
msg225701 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-08-22 19:59
This issue is about the SyntaxError message for eval functions, not the ast module per se. My first response is that the reported message is not a bug and that this issue should be closed as 'not a bug'.

(General reason) Trying to eval an expression preceded by a comment on its own line or followed by a comment works.

>>> eval("#before\n'string'#after")
'string'

Trying to eval a bare comment *is* a syntax error.

>>> eval("#comment\n")
...
SyntaxError: unexpected EOF while parsing

So the issue as presented is the special-case message.  However, messages are not part of the language specification and improving them is often/usually/always? treated as an enhancement.  Changing them will break code and tests that depend on the exact wording. 2.7 does not get enhancements.

(Specific reason) In 2.x, the input to (literal-)eval is either latin-1 encoded bytes or unicode. 'Latin-1' input could potentially consist of an encoding declaration on one line followed on the next line by a literal string encoded as indicated.

>>> le("# -*- coding: utf-8 -*-\n'string'")
'string'

Unicode input, the subject of this issue, is encoded to latin-1, which means that any literal string in the expression has to be latin-1 encoded. Therefore, a latin-1 encoding declaration is redundant and anything else is either redundant (if the original unicode only contains characters that encode the same in latin-1, as in the example above) or wrong, with hard to predict behavior.  Someone thought it worthwhile to add the special case check.  I think it should be left as is.

Jorgen, please either close this or explain why you think not, in light of the above.
msg225704 - (view) Author: Jorgen Schäfer (jorgenschaefer) Date: 2014-08-22 20:27
I do not understand how your comments apply to this bug. There is no
comment anywhere.  There is a single string literal whose contents look
like a comment. The expression parses correctly without syntax error if you
add a few newlines in front. Could you clarify your objection?
On Aug 22, 2014 9:59 PM, "Terry J. Reedy" <report@bugs.python.org> wrote:

>
> Terry J. Reedy added the comment:
>
> This issue is about the SyntaxError message for eval functions, not the
> ast module per se. My first response is that the reported message is not a
> bug and that this issue should be closed as 'not a bug'.
>
> (General reason) Trying to eval an expression preceded by a comment on its
> own line or followed by a comment works.
>
> >>> eval("#before\n'string'#after")
> 'string'
>
> Trying to eval a bare comment *is* a syntax error.
>
> >>> eval("#comment\n")
> ...
> SyntaxError: unexpected EOF while parsing
>
> So the issue as presented is the special-case message.  However, messages
> are not part of the language specification and improving them is
> often/usually/always? treated as an enhancement.  Changing them will break
> code and tests that depend on the exact wording. 2.7 does not get
> enhancements.
>
> (Specific reason) In 2.x, the input to (literal-)eval is either latin-1
> encoded bytes or unicode. 'Latin-1' input could potentially consist of an
> encoding declaration on one line followed on the next line by a literal
> string encoded as indicated.
>
> >>> le("# -*- coding: utf-8 -*-\n'string'")
> 'string'
>
> Unicode input, the subject of this issue, is encoded to latin-1, which
> means that any literal string in the expression has to be latin-1 encoded.
> Therefore, a latin-1 encoding declaration is redundant and anything else is
> either redundant (if the original unicode only contains characters that
> encode the same in latin-1, as in the example above) or wrong, with hard to
> predict behavior.  Someone thought it worthwhile to add the special case
> check.  I think it should be left as is.
>
> Jorgen, please either close this or explain why you think not, in light of
> the above.
>
> ----------
> nosy: +terry.reedy
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue22221>
> _______________________________________
>
msg225716 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-08-22 22:48
[When responding, please do not quote more than a line or two. If responding by email, please delete the rest. Otherwise, the result is extra noise when viewing online.]

You are right, I missed the outer 's, though my examples are not completely irrelevant. Eval looks inside the inner quotes for a coding line in certain circumstances, or maybe it always looks and we do not notice when there is not problem.  Here are some of my results on US Win 7, cp1252, 3.4.1, interactive prompt, idle

pass: eval(u'"""# -*- coding: utf-8 -*-\na"""')
fail: eval(u'"""\n# -*- coding: utf-8 -*-\na"""')
  since coding can be on line 1 or 2, these should be same
pass: eval(u'"""\n\n# -*- coding: utf-8 -*-\na"""')
  coding on 3rd line should be ignored
fail: eval(u'"""\\\n# -*- coding: utf-8 -*-\na"""')
  logically, this matches the first example; physically, the second
pass: eval(u'"""# -*- coding: utf-8 -*-\na€"""')
  but € prints as \xc2\x80', its utf-8 encoding as pasted in

From file, saved from Idle editor as cp1252
pass: print(eval("# -*- coding: utf-8 -*-\n'euro€'"))
  no u prefix, € prints as €
fail: print(eval(u"# -*- coding: utf-8 -*-\n'euro€'"))

Save the following two lines in one file as utf-8
pass: print(eval("# -*- coding: utf-8 -*-\n'euro€'"))
print(eval(u"# -*- coding: utf-8 -*-\n'euro∢'"))
  but € & ∢ print as '€' & '∢'
  adding # -*- coding: utf-8 -*- line makes no difference
  adding u prefix fails either way
msg225735 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-23 07:44
This is the same issue as issue18960. Here is backported patch with additional test.
msg226404 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-05 07:26
New changeset dd1e21f17b1c by Serhiy Storchaka in branch '2.7':
Issue #22221: Backported fixes from Python 3 (issue #18960).
http://hg.python.org/cpython/rev/dd1e21f17b1c
msg226407 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2014-09-05 08:11
New changeset 13cd8ea4cafe by Serhiy Storchaka in branch '3.4':
Issue #22221: Add tests for compile() with source encoding cookie.
http://hg.python.org/cpython/rev/13cd8ea4cafe

New changeset 9d335a54d728 by Serhiy Storchaka in branch 'default':
Issue #22221: Add tests for compile() with source encoding cookie.
http://hg.python.org/cpython/rev/9d335a54d728
History
Date User Action Args
2014-09-05 08:28:34serhiy.storchakasetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2014-09-05 08:11:59python-devsetmessages: + msg226407
2014-09-05 07:26:24python-devsetnosy: + python-dev
messages: + msg226404
2014-08-23 07:44:39serhiy.storchakasetfiles: + source_encoding_second_line-2.7.patch
keywords: + patch
messages: + msg225735

stage: needs patch -> patch review
2014-08-22 22:48:40terry.reedysetmessages: + msg225716
2014-08-22 20:27:04jorgenschaefersetmessages: + msg225704
2014-08-22 19:59:06terry.reedysetnosy: + terry.reedy
messages: + msg225701
2014-08-17 20:31:57serhiy.storchakasetassignee: serhiy.storchaka
type: behavior
components: + Interpreter Core, - Library (Lib)

nosy: + serhiy.storchaka
messages: + msg225469
stage: needs patch
2014-08-17 20:11:27davidhaltersetnosy: + davidhalter
2014-08-17 19:53:49jorgenschaefercreate