classification
Title: Double coding cookie
Type: behavior Stage: resolved
Components: Interpreter Core, Library (Lib) Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: gvanrossum, lemburg, loewis, python-dev, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2016-03-17 12:00 by serhiy.storchaka, last changed 2016-03-20 21:52 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize_double_coding.patch serhiy.storchaka, 2016-03-17 12:00 review
Messages (8)
msg261909 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-17 12:00
When Python source file contains double coding cookies on different lines, the first wins. When it contains double coding cookies on the same line, the last wins.

PEP 263 was sufficiently vague about this. Now this is clarified (22490711c870). The first coding cookie should always win.

Proposed patch fixes Python tokenizer, the tokenize module, and other places. Tests are taken from issue25643.
msg262051 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-19 15:26
I just tested with Emacs, and it looks that when specify different codings on two different lines, the first coding wins, but when specify different codings on the same line, the last coding wins.

Therefore current CPython behavior can be correct, and the regular expression in PEP 263 should be changed to use greedy repetition.
msg262052 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-03-19 16:12
Do you have write permission to the PEP? Just update it.
msg262053 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-19 16:37
Yes, I have. But I were not sure what behavior should be correct in Python. On one side, always choosing the first declaration (on the same or on different lines) looks more consistent. On other side, current behavior was in CPython from the initial implementing PEP 263 in issue526840 and it matches Emacs behavior (if I understand this correctly).

I can update the regular expression, but may be this obscure corner case needs the verbal explanation.
msg262054 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2016-03-19 16:48
Right. Please go ahead with both. I am fine with defining the current
behavior correct.

--Guido (mobile)
On Mar 19, 2016 9:37 AM, "Serhiy Storchaka" <report@bugs.python.org> wrote:

>
> Serhiy Storchaka added the comment:
>
> Yes, I have. But I were not sure what behavior should be correct in
> Python. On one side, always choosing the first declaration (on the same or
> on different lines) looks more consistent. On other side, current behavior
> was in CPython from the initial implementing PEP 263 in issue526840 and it
> matches Emacs behavior (if I understand this correctly).
>
> I can update the regular expression, but may be this obscure corner case
> needs the verbal explanation.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue26581>
> _______________________________________
>
msg262089 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-03-20 21:29
Ah, I made a mistake! In 2.7 the first coding on the same line wins. And that behavior was from start. Regression was unintentionally introduced in issue18470.

Thus *there is* a bug in Python 3. PEP 263 doesn't need more changes, but Python tokenizer and related tools do.

Sorry for misleading.
msg262090 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-03-20 21:30
New changeset 23a7481eafd4 by Serhiy Storchaka in branch 'default':
Issues #25643, #26581: Added new tests for detecting Python source code encoding.
https://hg.python.org/cpython/rev/23a7481eafd4
msg262092 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-03-20 21:51
New changeset 1c44cea2ea8f by Serhiy Storchaka in branch '3.5':
Issue #26581: Use the first coding cookie on a line, not the last one.
https://hg.python.org/cpython/rev/1c44cea2ea8f

New changeset 8506d127d482 by Serhiy Storchaka in branch '2.7':
Issue #26581: Use the first coding cookie on a line, not the last one.
https://hg.python.org/cpython/rev/8506d127d482

New changeset e86cd4a872b8 by Serhiy Storchaka in branch 'default':
Issue #26581: Use the first coding cookie on a line, not the last one.
https://hg.python.org/cpython/rev/e86cd4a872b8
History
Date User Action Args
2016-03-20 21:52:08serhiy.storchakasetstatus: open -> closed
assignee: serhiy.storchaka
resolution: fixed
stage: patch review -> resolved
2016-03-20 21:51:22python-devsetmessages: + msg262092
2016-03-20 21:30:28python-devsetnosy: + python-dev
messages: + msg262090
2016-03-20 21:29:54serhiy.storchakasetmessages: + msg262089
2016-03-19 16:48:13gvanrossumsetmessages: + msg262054
2016-03-19 16:37:37serhiy.storchakasetmessages: + msg262053
2016-03-19 16:12:07gvanrossumsetmessages: + msg262052
2016-03-19 15:26:20serhiy.storchakasetnosy: + lemburg, loewis
messages: + msg262051
2016-03-17 12:04:22serhiy.storchakalinkissue25643 dependencies
2016-03-17 12:00:36serhiy.storchakacreate