This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Why does Python interpreter care about curvy quotes in comments?
Type: behavior Stage: resolved
Components: Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Phillip.M.Feldman, Phillip.M.Feldman@gmail.com, ezio.melotti, loewis, rhettinger
Priority: normal Keywords:

Created on 2011-10-15 05:18 by Phillip.M.Feldman, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (7)
msg145583 - (view) Author: Phillip Feldman (Phillip.M.Feldman) Date: 2011-10-15 05:18
When I try to run a Python script that contains curvy quotes inside comments, the interpreter gets upset:

SyntaxError: Non-ASCII character '\x92' in file ... on line 20198, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Given that the quotes are appearing only in comments, why does the interpreter care about them?  Why should it be doing anything at all with comments other than stripping them off?
msg145588 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-15 11:26
The error message told you exactly what the problem is. Your source file does not conform to PEP 263. The PEP also explains why this applies to comments as well: because the entire file gets decoded according to the source encoding, and parsing (including determining what comments are) only starts afterwards.

Closing the report as invalid.
msg145657 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2011-10-17 04:22
Hello Martin,

This is a fine example of the so-called "is-ought" controversy.  The error
message is indeed telling me exactly what the problem is, but the underlying
problem is that this scheme was poorly thought out.  Clearly, the stripping
of comments and the source decoding should both be done in a single pass,
and the source decoding should not be applied to the comments.

Phillip

On Sat, Oct 15, 2011 at 4:26 AM, Martin v. Löwis <report@bugs.python.org>wrote:

>
> Martin v. Löwis <martin@v.loewis.de> added the comment:
>
> The error message told you exactly what the problem is. Your source file
> does not conform to PEP 263. The PEP also explains why this applies to
> comments as well: because the entire file gets decoded according to the
> source encoding, and parsing (including determining what comments are) only
> starts afterwards.
>
> Closing the report as invalid.
>
> ----------
> nosy: +loewis
> resolution:  -> invalid
> status: open -> closed
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue13185>
> _______________________________________
>
msg145658 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-10-17 05:12
In theory, with some encodings you can't even know where the line (and the comment) ends if you don't decode first.  Also it doesn't seem worth  to me changing the way files are parsed just for this use case.
Assuming you are using UTF-8 (and you should), you shouldn't have any problem with Python 3, since it opens files using UTF-8 by default.  It's anyway always better to be specific about the encoding you are using.
msg145660 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2011-10-17 05:41
Am 17.10.2011 06:22, schrieb Phillip Feldman:
> This is a fine example of the so-called "is-ought" controversy.

Wrong. This has nothing to do with desired and factual. A bug, by
definition, is a deviation from the specification. This is not a bug,
since it exactly follows the specification.

Now you may want to challenge the specification, which makes it a
feature request. However, given that the PEP was discussed in 2001,
you are about ten years late for that.

> underlying problem is that this scheme was poorly thought out.

I object this assessment. This very behavior was carefully considered
and deliberately chosen.

> Clearly, the stripping of comments and the source decoding should both be done in
> a single pass, and the source decoding should not be applied to the
> comments.

That's not clear at all. In general (i.e. for arbitrary encodings), it
is not possible to determine where the hash ("#") signs are in the input
without decoding. So you have to decode first.

In addition, it was a deliberate choice that the source encoding must be
consistent (i.e. all characters in the source must decode correctly),
even if that is not needed for parsing. This is like requiring colons
at the end of statements: they are not needed for parsing, but requiring
them improves the language.

Regards,
Martin
msg145661 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-10-17 05:53
I concur with Martin and Ezio.
This report was correctly closed as invalid.
msg145774 - (view) Author: Phillip M. Feldman (Phillip.M.Feldman@gmail.com) Date: 2011-10-18 06:23
I'm beginning to understand the reasoning.  This is quite a bit more complex
than I initially thought, and I appreciate the explanations.

Phillip

On Sun, Oct 16, 2011 at 10:53 PM, Raymond Hettinger
<report@bugs.python.org>wrote:

>
> Raymond Hettinger <raymond.hettinger@gmail.com> added the comment:
>
> I concur with Martin and Ezio.
> This report was correctly closed as invalid.
>
> ----------
> nosy: +rhettinger
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue13185>
> _______________________________________
>
History
Date User Action Args
2022-04-11 14:57:22adminsetgithub: 57394
2011-10-18 06:23:09Phillip.M.Feldman@gmail.comsetmessages: + msg145774
2011-10-17 05:53:05rhettingersetnosy: + rhettinger
messages: + msg145661
2011-10-17 05:41:43loewissetmessages: + msg145660
2011-10-17 05:12:17ezio.melottisetnosy: + ezio.melotti
messages: + msg145658

type: behavior
stage: resolved
2011-10-17 04:22:14Phillip.M.Feldman@gmail.comsetnosy: + Phillip.M.Feldman@gmail.com
messages: + msg145657
2011-10-15 11:26:01loewissetstatus: open -> closed

nosy: + loewis
messages: + msg145588

resolution: not a bug
2011-10-15 05:18:45Phillip.M.Feldmansetversions: + Python 2.7
2011-10-15 05:18:24Phillip.M.Feldmancreate