[Py3k] line number is wrong after encoding declaration #46637

ocean-city · 2008-03-18T07:28:38Z

BPO	2384
Nosy	@warsaw, @amauryfa, @pitrou, @vstinner
Dependencies	bpo-3975: PyTraceBack_Print() doesn't respect # coding: xxx header
Files	test_traceback.patch tokenizer-coding-4.patch: Fix this issue and add a testcase (for this issue and issue 3975) (version 4)

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2008-10-09.23:52:49.754>
created_at = <Date 2008-03-18.07:28:38.119>
labels = ['type-bug']
title = '[Py3k] line number is wrong after encoding declaration'
updated_at = <Date 2008-10-09.23:52:49.753>
user = 'https://bugs.python.org/ocean-city'

bugs.python.org fields:

activity = <Date 2008-10-09.23:52:49.753>
actor = 'amaury.forgeotdarc'
assignee = 'none'
closed = True
closed_date = <Date 2008-10-09.23:52:49.754>
closer = 'amaury.forgeotdarc'
components = ['None']
creation = <Date 2008-03-18.07:28:38.119>
creator = 'ocean-city'
dependencies = ['3975']
files = ['9943', '11738']
hgrepos = []
issue_num = 2384
keywords = ['patch']
message_count = 22.0
messages = ['63905', '64157', '64965', '67953', '71308', '71336', '72183', '73512', '73843', '73845', '73846', '73847', '73852', '73857', '73883', '73889', '73929', '74394', '74435', '74497', '74498', '74611']
nosy_count = 7.0
nosy_names = ['barry', 'amaury.forgeotdarc', 'pitrou', 'vstinner', 'ocean-city', 'jmfauth', 'dlitz']
pr_nums = []
priority = 'high'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue2384'
versions = ['Python 3.0']

ocean-city · 2008-03-18T07:28:37Z

# This issue inherits from bpo-2301.

If there is "# coding: ????" is in source code and
coding is neigher utf-8 nor iso-8859-1, line number (tok->lineno)
becomes wrong.

Please look into Parser/tokenizer.c. In this case,
tok->decoding_state becomes STATE_NORMAL, so fp_setreadl
newly opens file but *doesn't* seek to current position.
(Or maybe can we reuse already opened file?)

So

# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6

outputs

C:\Documents and Settings\WhiteRabbit>py3k ascii.py

Traceback (most recent call last):
  File "ascii.py", line 6, in <module>
    # 4
RuntimeError: a
[22821 refs]

One line shifted because line number wrongly +1

And

# dummy
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6

outputs

C:\Documents and Settings\WhiteRabbit>py3k ascii.py

Traceback (most recent call last):
  File "ascii.py", line 8, in <module>
    # 5
RuntimeError: a
[22821 refs]

Two lines shifted because line number wrongly +2

ocean-city · 2008-03-20T06:15:09Z

Following dirty hack workarounds this bug. Comment of this function
says not ascii compatible encoding is not supported yet, (ie: UTF-16)
so probably this works.

Index: Parser/tokenizer.c
===================================================================

--- Parser/tokenizer.c	(revision 61632)
+++ Parser/tokenizer.c	(working copy)
@@ -464,6 +464,7 @@
 	Py_XDECREF(tok->decoding_readline);
 	readline = PyObject_GetAttrString(stream, "readline");
 	tok->decoding_readline = readline;
+	tok->lineno = -1; /* dirty hack */
 
   cleanup:
 	Py_XDECREF(stream);

But if multibyte character is in line like this, its line will not be
printed.

# coding: cp932
# 1
raise RuntimeError("あいうえお")
# 2

C:\Documents and Settings\WhiteRabbit>py3k cp932.py
Traceback (most recent call last):
  File "cp932.py", line 3, in <module>
    [22819 refs]

This is because Python/trackeback.c 's tb_displayline() assumes
input line is encoded with UTF-8. (simply using FILE structure +
Py_UniversalNewlineFgets)

# http://mail.python.org/pipermail/python-3000/2008-March/012546.html
# sounds nice, if we can replace all FILE structure to Python's own
# fast enough codeced Reader or something.

ocean-city · 2008-04-05T05:25:27Z

I've written testcase for lineno problem.

warsaw · 2008-06-11T12:16:05Z

This is a bug and not a new feature, so it could go in after beta. I'm
knocking it down to a critical.

warsaw · 2008-08-18T01:29:35Z

While this is a bug, it's not serious enough to hold up the release.

jmfauth · 2008-08-18T14:51:07Z

Py3.0b2. This bug seems to be quite annoying. Especially when one works
with a main module importing modules which are importing modules and so
on, all modules having an encoding declaration. The Traceback (and the
user) is (are) a little bit lost.

---------
# -- coding: cp1252 --
# modb.py

def fb():
    i = 1
    j = 0
    r =  i / j

# -- coding: cp1252 --
# moda.py

import modb

def fa():
    modb.fb()

# -- coding: cp1252 --
# main.py

import moda

def main():
    moda.fa()

if __name__ == '__main__':
    main()

Running main.py leads to an

>c:\python30\pythonw -u "main.py"
(Traceback (most recent call last):
  File "main.py", line 11, in <module>
    
  File "main.py", line 8, in main
    
  File "C:\jm\jmpy3\moda.py", line 8, in fa
    
  File "C:\jm\jmpy3\modb.py", line 8, in fb
    
ZeroDivisionError: int division or modulo by zero
>Exit code: 1

DLitz · 2008-08-30T04:47:09Z

Could "-- coding: ascii --" and other equivalent encodings be fixed,
at least, before the release?

jmfauth · 2008-09-21T16:14:48Z

Python 3.0rc1

If the lines are now displayed correctly, I think there is still a
numbering issue, a +1 offset.

Python 2.5.2

# -- coding: cp1252 -- <<<< line 1, first line

s = 'abc'
import dummy
s = 'def'

>pythonw -u "testpy2.py"
Traceback (most recent call last):
  File "testpy2.py", line 4, in <module>
    import dummy
ImportError: No module named dummy
>Exit code: 1

Python 3.0rc1

# -- coding: cp1252 --

s = 'abc'
import dummy
s = 'def'

>c:\python30\pythonw -u "testpy3.py"
Traceback (most recent call last):
  File "testpy3.py", line 5, in <module>
    s = 'def'
ImportError: No module named dummy
>Exit code: 1

pitrou · 2008-09-26T11:53:02Z

bpo-3973 is a duplicate.

vstinner · 2008-09-26T12:21:38Z

By setting lineto to 1 (as proposed ocean-city), ASCII tests (test1
and test2, see below) works correctly. This change doesn't impact
utf-8/iso-8859-1 charset (it's special case).

--- test1 ---
# coding: ASCII
raise Exception("here")
-------------

--- test2 ---
# useless at line 1
# coding: ASCII
raise Exception("here")
-------------

I don't know how to test a UTF-16 file. Can someone write a testcase?

vstinner · 2008-09-26T12:51:41Z

ocean-city testcase is invalid: it uses subprocess.call() which
returns the exit code, not the Python error line number! Here is a
better testcase using subprocess.Popen() checking the line number but
also the display line. It tests ASCII, UTF-8 and GBK charsets. Using
GBK charset, you get the bug described by ocean-city (problem with
multibyte charset). My testcase takes also care of script with #
coding at the second line.

vstinner · 2008-09-26T13:06:59Z

Hum, about the empty line error using a multibyte charset, the issue
is different. PyTraceBack_Print() calls _Py_DisplaySourceLine() which
doesn't take care of the charset.

vstinner · 2008-09-26T15:13:51Z

Here is a patch fixing this issue: it's quite the same that ocean-city
patch, but I prefer to patch lineno only if set_readline() succeed.

About the truncated traceback for multibyte charset: see the new
bpo-3975.

vstinner · 2008-09-26T16:11:50Z

Oh! My patch breaks "python -m". The problem is maybe no in the token
parser but... somewhere else?
--- test.py ---
# coding: ASCII
raise Exception("line 2")
# try again!
---------------

Python 3.0 trunk unpatched:
---

$ ./python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>

$ ./python -m test
Traceback (most recent call last):
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
_run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
    exec(code, run_globals)
  File "/home/haypo/prog/py3k/test.py", line 2, in <module>
    raise Exception("line 2")
Exception: line 2

Python 3.0 trunk + tokenizer-coding.patch:
---

marge$ ./python test.py
Traceback (most recent call last):
  File "test.py", line 2, in <module>
    raise Exception("line 2")
Exception: line 2

Traceback (most recent call last):
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
_run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
    exec(code, run_globals)
  File "/home/haypo/prog/py3k/test.py", line 1, in <module>
    # coding: ASCII
Exception: line 2

ocean-city · 2008-09-26T20:24:45Z

Victor, this is fp_setreadl's problem, so if put "tok->lineno = -1"
anywhere, it should be in fp_setreadl(), I think.

	r = set_readline(tok, cs);
	if (r) {
		/* 1 */
		tok->encoding = cs;
		tok->decoding_state = STATE_NORMAL;

At /* 1 */, set_readline could be buf_setreadl(), and fp_setreadl is
called elsewhere.

vstinner · 2008-09-26T21:20:13Z

@Ocean-City: Oops, sorry. Using your patch (set lineno in
fp_setreadl()), it works on both cases ("python test.py" or "python -m
test").

The new patch includes your fix for tokenizer.c and a new version of the
testcase.

vstinner · 2008-09-27T15:31:13Z

bpo-2832 is a duplicate.

vstinner · 2008-10-06T21:53:32Z

benjamin was afraid by the comment /* dirty hack */ in my previous
comment. After reading tokenizer.c again and again, I can say that the
fix is correct: the file is closed and then re-opened by fp_setreadl()
(using io.open()), and so the file cursor goes back to the file start.

amauryfa · 2008-10-07T10:49:47Z

Your patch does the correct thing, however an explanation of the -1
value would be welcome. Something like:
/* The file has been reopened; parsing will restart from

the beginning of the file, we have to reset the line number.
But this function has been called from inside tok_nextc() which
will increment lineno before it returns. So we set it -1 so that
the next call to tok_nextc() will start with tok->lineno == 0.
*/

Or we could change the place of the tok->lineno++ in tok_nextc() so that
it is called before the call to decoding_fgets(); other changes will be
needed.

Then, I think that your test is not correct: What is the meaning of the
following line?
sys.exit(traceback.tb_lineno(sys.exc_info()[2]))
(the module "traceback" has no attribute "tp_lineno")
I presume that you intended something like:
traceback.print_exc()
sys.exit(sys.exc_info()[2].tb_lineno)
and test at some point that "process.returncode == lineno"

vstinner · 2008-10-07T23:22:06Z

@Amaury: Ok, I added your long comment in tokenizer.c. You're also
right about the strange code in the test. I reused ocean-city's
test. "sys.exc_info()[2].tb_lineno" raises an additional (useless)
error. So I simplified the code to use only "raise RuntimeError(...)"
with the try/except/else.

Since tokenizer.c is hard to understand, I don't wnat to change the
code of tok_nextc().

amauryfa · 2008-10-07T23:44:00Z

This issue depends on bpo-3975 to properly display tracebacks from python
files with encoding.

amauryfa · 2008-10-09T23:45:24Z

Committed r66867.

I had to considerably change the unit tests, because the subprocess
output is not utf-8 encoded; it's not even the same as sys.stdout,
because the spawned process uses a PIPE, not a terminal: on my winXP,
the main interpreter uses cp437, but the subprocess says cp1252. So I
first run a 'python -c "print(sys.stdout.encoding)"' in the same
conditions just to retrieve the encoding. fun fun.
I hope this still works on Unixes, will watch the buildbots.

ocean-city mannequin added the type-bug An unexpected behavior, bug, or error label Mar 20, 2008

birkenfeld added the release-blocker label May 18, 2008

warsaw removed the release-blocker label Jun 11, 2008

benjaminp added the release-blocker label Jul 31, 2008

warsaw removed the release-blocker label Aug 18, 2008

amauryfa closed this as completed Oct 9, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Py3k] line number is wrong after encoding declaration #46637

[Py3k] line number is wrong after encoding declaration #46637

ocean-city mannequin commented Mar 18, 2008

ocean-city mannequin commented Mar 18, 2008

ocean-city mannequin commented Mar 20, 2008

ocean-city mannequin commented Apr 5, 2008

warsaw commented Jun 11, 2008

warsaw commented Aug 18, 2008

jmfauth mannequin commented Aug 18, 2008

DLitz mannequin commented Aug 30, 2008

jmfauth mannequin commented Sep 21, 2008

pitrou commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

ocean-city mannequin commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 27, 2008

vstinner commented Oct 6, 2008

amauryfa commented Oct 7, 2008

vstinner commented Oct 7, 2008

amauryfa commented Oct 7, 2008

amauryfa commented Oct 9, 2008

[Py3k] line number is wrong after encoding declaration #46637

[Py3k] line number is wrong after encoding declaration #46637

Comments

ocean-city mannequin commented Mar 18, 2008

ocean-city mannequin commented Mar 18, 2008

ocean-city mannequin commented Mar 20, 2008

ocean-city mannequin commented Apr 5, 2008

warsaw commented Jun 11, 2008

warsaw commented Aug 18, 2008

jmfauth mannequin commented Aug 18, 2008

DLitz mannequin commented Aug 30, 2008

jmfauth mannequin commented Sep 21, 2008

pitrou commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 26, 2008

ocean-city mannequin commented Sep 26, 2008

vstinner commented Sep 26, 2008

vstinner commented Sep 27, 2008

vstinner commented Oct 6, 2008

amauryfa commented Oct 7, 2008

vstinner commented Oct 7, 2008

amauryfa commented Oct 7, 2008

amauryfa commented Oct 9, 2008