classification
Title: [Py3k] line number is wrong after encoding declaration
Type: behavior Stage:
Components: None Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: 3975 Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, barry, dlitz, haypo, jmfauth, ocean-city, pitrou
Priority: high Keywords: patch

Created on 2008-03-18 07:28 by ocean-city, last changed 2008-10-09 23:52 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
test_traceback.patch ocean-city, 2008-04-05 05:25
tokenizer-coding-4.patch haypo, 2008-10-07 23:23 Fix this issue and add a testcase (for this issue and issue 3975) (version 4)
Messages (22)
msg63905 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-03-18 07:28
# This issue inherits from issue2301.

If there is "# coding: ????" is in source code and
coding is neigher utf-8 nor iso-8859-1, line number (tok->lineno)
becomes wrong.

Please look into Parser/tokenizer.c. In this case, 
tok->decoding_state becomes STATE_NORMAL, so fp_setreadl
newly opens file but *doesn't* seek to current position.
(Or maybe can we reuse already opened file?)

So

# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6

outputs 

C:\Documents and Settings\WhiteRabbit>py3k ascii.py

Traceback (most recent call last):
  File "ascii.py", line 6, in <module>
    # 4
RuntimeError: a
[22821 refs]

One line shifted because line number wrongly +1

And

# dummy
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6

outputs

C:\Documents and Settings\WhiteRabbit>py3k ascii.py

Traceback (most recent call last):
  File "ascii.py", line 8, in <module>
    # 5
RuntimeError: a
[22821 refs]

Two lines shifted because line number wrongly +2
msg64157 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-03-20 06:15
Following dirty hack workarounds this bug. Comment of this function
says not ascii compatible encoding is not supported yet, (ie: UTF-16)
so probably this works.

Index: Parser/tokenizer.c
===================================================================
--- Parser/tokenizer.c	(revision 61632)
+++ Parser/tokenizer.c	(working copy)
@@ -464,6 +464,7 @@
 	Py_XDECREF(tok->decoding_readline);
 	readline = PyObject_GetAttrString(stream, "readline");
 	tok->decoding_readline = readline;
+	tok->lineno = -1; /* dirty hack */
 
   cleanup:
 	Py_XDECREF(stream);

But if multibyte character is in line like this, its line will not be
printed.

# coding: cp932
# 1
raise RuntimeError("あいうえお")
# 2

C:\Documents and Settings\WhiteRabbit>py3k cp932.py
Traceback (most recent call last):
  File "cp932.py", line 3, in <module>
    [22819 refs]

This is because Python/trackeback.c 's tb_displayline() assumes
input line is encoded with UTF-8. (simply using FILE structure +
Py_UniversalNewlineFgets)

# http://mail.python.org/pipermail/python-3000/2008-March/012546.html
# sounds nice, if we can replace all FILE structure to Python's own
# fast enough codeced Reader or something.
msg64965 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-04-05 05:25
I've written testcase for lineno problem.
msg67953 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-06-11 12:16
This is a bug and not a new feature, so it could go in after beta.  I'm
knocking it down to a critical.
msg71308 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2008-08-18 01:29
While this is a bug, it's not serious enough to hold up the release.
msg71336 - (view) Author: jmf (jmfauth) Date: 2008-08-18 14:51
Py3.0b2. This bug seems to be quite annoying. Especially when one works
with a main module importing modules which are importing modules and so
on, all modules having an encoding declaration. The Traceback (and the
user) is (are) a little bit lost.

---------
# -*- coding: cp1252 -*-
# modb.py

def fb():
    i = 1
    j = 0
    r =  i / j    
-----------
# -*- coding: cp1252 -*-
# moda.py

import modb

def fa():
    modb.fb()
-----------
# -*- coding: cp1252 -*-
# main.py

import moda

def main():
    moda.fa()

if __name__ == '__main__':
    main()
-----------

Running main.py leads to an

>c:\python30\pythonw -u "main.py"
(Traceback (most recent call last):
  File "main.py", line 11, in <module>
    
  File "main.py", line 8, in main
    
  File "C:\jm\jmpy3\moda.py", line 8, in fa
    
  File "C:\jm\jmpy3\modb.py", line 8, in fb
    
ZeroDivisionError: int division or modulo by zero
>Exit code: 1
msg72183 - (view) Author: Dwayne Litzenberger (dlitz) Date: 2008-08-30 04:47
Could "-*- coding: ascii -*-" and other equivalent encodings be fixed,
at least, before the release?
msg73512 - (view) Author: jmf (jmfauth) Date: 2008-09-21 16:14
Python 3.0rc1

If the lines are now displayed correctly, I think there is still a
numbering issue, a +1 offset.

Python 2.5.2

# -*- coding: cp1252 -*-      <<<< line 1, first line

s = 'abc'
import dummy
s = 'def'

---

>pythonw -u "testpy2.py"
Traceback (most recent call last):
  File "testpy2.py", line 4, in <module>
    import dummy
ImportError: No module named dummy
>Exit code: 1


Python 3.0rc1

# -*- coding: cp1252 -*-

s = 'abc'
import dummy
s = 'def'

---

>c:\python30\pythonw -u "testpy3.py"
Traceback (most recent call last):
  File "testpy3.py", line 5, in <module>
    s = 'def'
ImportError: No module named dummy
>Exit code: 1
msg73843 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-09-26 11:53
#3973 is a duplicate.
msg73845 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 12:21
By setting lineto to 1 (as proposed ocean-city), ASCII tests (test1 
and test2, see below) works correctly. This change doesn't impact 
utf-8/iso-8859-1 charset (it's special case).

--- test1 ---
# coding: ASCII
raise Exception("here")
-------------

--- test2 ---
# useless at line 1
# coding: ASCII
raise Exception("here")
-------------

I don't know how to test a UTF-16 file. Can someone write a testcase?
msg73846 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 12:51
ocean-city testcase is invalid: it uses subprocess.call() which 
returns the exit code, not the Python error line number! Here is a 
better testcase using subprocess.Popen() checking the line number but 
also the display line. It tests ASCII, UTF-8 and GBK charsets. Using 
GBK charset, you get the bug described by ocean-city (problem with 
multibyte charset). My testcase takes also care of script with # 
coding at the second line.
msg73847 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 13:06
Hum, about the empty line error using a multibyte charset, the issue 
is different. PyTraceBack_Print() calls _Py_DisplaySourceLine() which 
doesn't take care of the charset.
msg73852 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 15:13
Here is a patch fixing this issue: it's quite the same that ocean-city 
patch, but I prefer to patch lineno only if set_readline() succeed.

About the truncated traceback for multibyte charset: see the new 
issue3975.
msg73857 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 16:11
Oh! My patch breaks "python -m". The problem is maybe no in the token 
parser but... somewhere else?
--- test.py ---
# coding: ASCII
raise Exception("line 2")
# try again!
---------------

Python 3.0 trunk unpatched:
---
$ ./python test.py
Traceback (most recent call last):
  File "test.py", line 3, in <module>

$ ./python -m test
Traceback (most recent call last):
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
_run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
    exec(code, run_globals)
  File "/home/haypo/prog/py3k/test.py", line 2, in <module>
    raise Exception("line 2")
Exception: line 2
---

Python 3.0 trunk + tokenizer-coding.patch:
---
marge$ ./python test.py
Traceback (most recent call last):
  File "test.py", line 2, in <module>
    raise Exception("line 2")
Exception: line 2

Traceback (most recent call last):
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in 
_run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
    exec(code, run_globals)
  File "/home/haypo/prog/py3k/test.py", line 1, in <module>
    # coding: ASCII
Exception: line 2
---
msg73883 - (view) Author: Hirokazu Yamamoto (ocean-city) * (Python committer) Date: 2008-09-26 20:24
Victor, this is fp_setreadl's problem, so if put "tok->lineno = -1"
anywhere, it should be in fp_setreadl(), I think.

	r = set_readline(tok, cs);
	if (r) {
		/* 1 */
		tok->encoding = cs;
		tok->decoding_state = STATE_NORMAL;

At /* 1 */, set_readline could be buf_setreadl(), and fp_setreadl is
called elsewhere.
msg73889 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-26 21:20
@ocean-city: Oops, sorry. Using your patch (set lineno in
fp_setreadl()), it works on both cases ("python test.py" or "python -m
test").

The new patch includes your fix for tokenizer.c and a new version of the
testcase.
msg73929 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-09-27 15:31
Issue 2832 is a duplicate.
msg74394 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-10-06 21:53
benjamin was afraid by the comment /* dirty hack */ in my previous 
comment. After reading tokenizer.c again and again, I can say that the 
fix is correct: the file is closed and then re-opened by fp_setreadl() 
(using io.open()), and so the file cursor goes back to the file start.
msg74435 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-07 10:49
Your patch does the correct thing, however an explanation of the -1
value would be welcome. Something like:
/* The file has been reopened; parsing will restart from 
 * the beginning of the file, we have to reset the line number.
 * But this function has been called from inside tok_nextc() which 
 * will increment lineno before it returns. So we set it -1 so that
 * the next call to tok_nextc() will start with tok->lineno == 0.
 */

Or we could change the place of the tok->lineno++ in tok_nextc() so that
it is called before the call to decoding_fgets(); other changes will be
needed.

Then, I think that your test is not correct: What is the meaning of the
following line?
    sys.exit(traceback.tb_lineno(sys.exc_info()[2]))
(the module "traceback" has no attribute "tp_lineno")
I presume that you intended something like:
    traceback.print_exc()
    sys.exit(sys.exc_info()[2].tb_lineno)
and test at some point that "process.returncode == lineno"
msg74497 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2008-10-07 23:22
@amaury: Ok, I added your long comment in tokenizer.c. You're also 
right about the strange code in the test. I reused ocean-city's 
test. "sys.exc_info()[2].tb_lineno" raises an additional (useless) 
error. So I simplified the code to use only "raise RuntimeError(...)" 
with the try/except/else.

Since tokenizer.c is hard to understand, I don't wnat to change the 
code of tok_nextc().
msg74498 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-07 23:43
This issue depends on #3975 to properly display tracebacks from python 
files with encoding.
msg74611 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-09 23:45
Committed r66867. 

I had to considerably change the unit tests, because the subprocess 
output is not utf-8 encoded; it's not even the same as sys.stdout, 
because the spawned process uses a PIPE, not a terminal: on my winXP, 
the main interpreter uses cp437, but the subprocess says cp1252. So I 
first run a 'python -c "print(sys.stdout.encoding)"' in the same 
conditions just to retrieve the encoding. fun fun.
I hope this still works on Unixes, will watch the buildbots.
History
Date User Action Args
2008-10-09 23:52:49amaury.forgeotdarcsetstatus: pending -> closed
2008-10-09 23:45:24amaury.forgeotdarcsetstatus: open -> pending
resolution: fixed
messages: + msg74611
2008-10-07 23:44:00amaury.forgeotdarcsetdependencies: + PyTraceBack_Print() doesn't respect # coding: xxx header
messages: + msg74498
2008-10-07 23:23:48hayposetfiles: + tokenizer-coding-4.patch
2008-10-07 23:23:18hayposetfiles: - traceback_unicode-4.patch
2008-10-07 23:22:27hayposetfiles: - test_traceback-gbk.patch
2008-10-07 23:22:16hayposetfiles: - tokenizer-coding-3.patch
2008-10-07 23:22:12hayposetfiles: - tokenizer-coding-2.patch
2008-10-07 23:22:06hayposetfiles: + traceback_unicode-4.patch
messages: + msg74497
2008-10-07 10:49:47amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg74435
2008-10-07 08:21:16haypolinkissue3574 dependencies
2008-10-06 21:53:32hayposetfiles: + tokenizer-coding-3.patch
messages: + msg74394
2008-09-27 15:32:44benjamin.petersonlinkissue2832 superseder
2008-09-27 15:31:12hayposetmessages: + msg73929
2008-09-27 14:52:59hayposetfiles: - tokenizer-coding.patch
2008-09-26 21:20:14hayposetfiles: + tokenizer-coding-2.patch
messages: + msg73889
2008-09-26 20:24:44ocean-citysetmessages: + msg73883
2008-09-26 16:11:50hayposetmessages: + msg73857
2008-09-26 15:13:51hayposetfiles: + tokenizer-coding.patch
messages: + msg73852
2008-09-26 13:06:59hayposetmessages: + msg73847
2008-09-26 12:51:41hayposetfiles: + test_traceback-gbk.patch
messages: + msg73846
2008-09-26 12:21:38hayposetmessages: + msg73845
2008-09-26 11:53:01pitrousetnosy: + pitrou
messages: + msg73843
2008-09-26 11:52:39pitrousetnosy: + haypo
2008-09-26 11:52:29pitroulinkissue3973 superseder
2008-09-21 16:14:49jmfauthsetmessages: + msg73512
2008-08-30 04:47:10dlitzsetmessages: + msg72183
2008-08-26 16:45:38dlitzsetnosy: + dlitz
2008-08-18 14:51:07jmfauthsetnosy: + jmfauth
messages: + msg71336
2008-08-18 01:29:35barrysetpriority: release blocker -> high
messages: + msg71308
2008-07-31 02:15:11benjamin.petersonsetpriority: critical -> release blocker
2008-06-11 12:16:06barrysetpriority: release blocker -> critical
nosy: + barry
messages: + msg67953
2008-05-18 20:00:32georg.brandlsetpriority: release blocker
2008-04-05 05:25:28ocean-citysetfiles: + test_traceback.patch
keywords: + patch
messages: + msg64965
2008-03-20 06:15:10ocean-citysettype: behavior
messages: + msg64157
2008-03-18 07:28:38ocean-citycreate