Issue 2384: [Py3k] line number is wrong after encoding declaration

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46637

classification

Title:	[Py3k] line number is wrong after encoding declaration
Type:	behavior	Stage:
Components:	None	Versions:	Python 3.0

process

Status:	closed	Resolution:	fixed
Dependencies:	3975	Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, barry, dlitz, jmfauth, ocean-city, pitrou, vstinner
Priority:	high	Keywords:	patch

Created on 2008-03-18 07:28 by ocean-city, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
test_traceback.patch	ocean-city, 2008-04-05 05:25
tokenizer-coding-4.patch	vstinner, 2008-10-07 23:23	Fix this issue and add a testcase (for this issue and issue 3975) (version 4)

Messages (22)
msg63905 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2008-03-18 07:28
# This issue inherits from issue2301. If there is "# coding: ????" is in source code and coding is neigher utf-8 nor iso-8859-1, line number (tok->lineno) becomes wrong. Please look into Parser/tokenizer.c. In this case, tok->decoding_state becomes STATE_NORMAL, so fp_setreadl newly opens file but doesn't seek to current position. (Or maybe can we reuse already opened file?) So # coding: ascii # 1 # 2 # 3 raise RuntimeError("a") # 4 # 5 # 6 outputs C:\Documents and Settings\WhiteRabbit>py3k ascii.py Traceback (most recent call last): File "ascii.py", line 6, in <module> # 4 RuntimeError: a [22821 refs] One line shifted because line number wrongly +1 And # dummy # coding: ascii # 1 # 2 # 3 raise RuntimeError("a") # 4 # 5 # 6 outputs C:\Documents and Settings\WhiteRabbit>py3k ascii.py Traceback (most recent call last): File "ascii.py", line 8, in <module> # 5 RuntimeError: a [22821 refs] Two lines shifted because line number wrongly +2
msg64157 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2008-03-20 06:15
Following dirty hack workarounds this bug. Comment of this function says not ascii compatible encoding is not supported yet, (ie: UTF-16) so probably this works. Index: Parser/tokenizer.c =================================================================== --- Parser/tokenizer.c (revision 61632) +++ Parser/tokenizer.c (working copy) @@ -464,6 +464,7 @@ Py_XDECREF(tok->decoding_readline); readline = PyObject_GetAttrString(stream, "readline"); tok->decoding_readline = readline; + tok->lineno = -1; /* dirty hack */ cleanup: Py_XDECREF(stream); But if multibyte character is in line like this, its line will not be printed. # coding: cp932 # 1 raise RuntimeError("あいうえお") # 2 C:\Documents and Settings\WhiteRabbit>py3k cp932.py Traceback (most recent call last): File "cp932.py", line 3, in <module> [22819 refs] This is because Python/trackeback.c 's tb_displayline() assumes input line is encoded with UTF-8. (simply using FILE structure + Py_UniversalNewlineFgets) # http://mail.python.org/pipermail/python-3000/2008-March/012546.html # sounds nice, if we can replace all FILE structure to Python's own # fast enough codeced Reader or something.
msg64965 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2008-04-05 05:25
I've written testcase for lineno problem.
msg67953 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2008-06-11 12:16
This is a bug and not a new feature, so it could go in after beta. I'm knocking it down to a critical.
msg71308 - (view)	Author: Barry A. Warsaw (barry) *	Date: 2008-08-18 01:29
While this is a bug, it's not serious enough to hold up the release.
msg71336 - (view)	Author: jmf (jmfauth)	Date: 2008-08-18 14:51
Py3.0b2. This bug seems to be quite annoying. Especially when one works with a main module importing modules which are importing modules and so on, all modules having an encoding declaration. The Traceback (and the user) is (are) a little bit lost. --------- # -- coding: cp1252 -- # modb.py def fb(): i = 1 j = 0 r = i / j ----------- # -- coding: cp1252 -- # moda.py import modb def fa(): modb.fb() ----------- # -- coding: cp1252 -- # main.py import moda def main(): moda.fa() if __name__ == '__main__': main() ----------- Running main.py leads to an >c:\python30\pythonw -u "main.py" (Traceback (most recent call last): File "main.py", line 11, in <module> File "main.py", line 8, in main File "C:\jm\jmpy3\moda.py", line 8, in fa File "C:\jm\jmpy3\modb.py", line 8, in fb ZeroDivisionError: int division or modulo by zero >Exit code: 1
msg72183 - (view)	Author: Dwayne Litzenberger (dlitz)	Date: 2008-08-30 04:47
Could "-- coding: ascii --" and other equivalent encodings be fixed, at least, before the release?
msg73512 - (view)	Author: jmf (jmfauth)	Date: 2008-09-21 16:14
Python 3.0rc1 If the lines are now displayed correctly, I think there is still a numbering issue, a +1 offset. Python 2.5.2 # -- coding: cp1252 -- <<<< line 1, first line s = 'abc' import dummy s = 'def' --- >pythonw -u "testpy2.py" Traceback (most recent call last): File "testpy2.py", line 4, in <module> import dummy ImportError: No module named dummy >Exit code: 1 Python 3.0rc1 # -- coding: cp1252 -- s = 'abc' import dummy s = 'def' --- >c:\python30\pythonw -u "testpy3.py" Traceback (most recent call last): File "testpy3.py", line 5, in <module> s = 'def' ImportError: No module named dummy >Exit code: 1
msg73843 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-09-26 11:53
#3973 is a duplicate.
msg73845 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 12:21
By setting lineto to 1 (as proposed ocean-city), ASCII tests (test1 and test2, see below) works correctly. This change doesn't impact utf-8/iso-8859-1 charset (it's special case). --- test1 --- # coding: ASCII raise Exception("here") ------------- --- test2 --- # useless at line 1 # coding: ASCII raise Exception("here") ------------- I don't know how to test a UTF-16 file. Can someone write a testcase?
msg73846 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 12:51
ocean-city testcase is invalid: it uses subprocess.call() which returns the exit code, not the Python error line number! Here is a better testcase using subprocess.Popen() checking the line number but also the display line. It tests ASCII, UTF-8 and GBK charsets. Using GBK charset, you get the bug described by ocean-city (problem with multibyte charset). My testcase takes also care of script with # coding at the second line.
msg73847 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 13:06
Hum, about the empty line error using a multibyte charset, the issue is different. PyTraceBack_Print() calls _Py_DisplaySourceLine() which doesn't take care of the charset.
msg73852 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 15:13
Here is a patch fixing this issue: it's quite the same that ocean-city patch, but I prefer to patch lineno only if set_readline() succeed. About the truncated traceback for multibyte charset: see the new issue3975.
msg73857 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 16:11
Oh! My patch breaks "python -m". The problem is maybe no in the token parser but... somewhere else? --- test.py --- # coding: ASCII raise Exception("line 2") # try again! --------------- Python 3.0 trunk unpatched: --- $ ./python test.py Traceback (most recent call last): File "test.py", line 3, in <module> $ ./python -m test Traceback (most recent call last): File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code exec(code, run_globals) File "/home/haypo/prog/py3k/test.py", line 2, in <module> raise Exception("line 2") Exception: line 2 --- Python 3.0 trunk + tokenizer-coding.patch: --- marge$ ./python test.py Traceback (most recent call last): File "test.py", line 2, in <module> raise Exception("line 2") Exception: line 2 Traceback (most recent call last): File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code exec(code, run_globals) File "/home/haypo/prog/py3k/test.py", line 1, in <module> # coding: ASCII Exception: line 2 ---
msg73883 - (view)	Author: Hirokazu Yamamoto (ocean-city) *	Date: 2008-09-26 20:24
Victor, this is fp_setreadl's problem, so if put "tok->lineno = -1" anywhere, it should be in fp_setreadl(), I think. r = set_readline(tok, cs); if (r) { /* 1 / tok->encoding = cs; tok->decoding_state = STATE_NORMAL; At / 1 */, set_readline could be buf_setreadl(), and fp_setreadl is called elsewhere.
msg73889 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-26 21:20
@ocean-city: Oops, sorry. Using your patch (set lineno in fp_setreadl()), it works on both cases ("python test.py" or "python -m test"). The new patch includes your fix for tokenizer.c and a new version of the testcase.
msg73929 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-09-27 15:31
Issue 2832 is a duplicate.
msg74394 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-10-06 21:53
benjamin was afraid by the comment /* dirty hack */ in my previous comment. After reading tokenizer.c again and again, I can say that the fix is correct: the file is closed and then re-opened by fp_setreadl() (using io.open()), and so the file cursor goes back to the file start.
msg74435 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-10-07 10:49
Your patch does the correct thing, however an explanation of the -1 value would be welcome. Something like: /* The file has been reopened; parsing will restart from * the beginning of the file, we have to reset the line number. * But this function has been called from inside tok_nextc() which * will increment lineno before it returns. So we set it -1 so that * the next call to tok_nextc() will start with tok->lineno == 0. */ Or we could change the place of the tok->lineno++ in tok_nextc() so that it is called before the call to decoding_fgets(); other changes will be needed. Then, I think that your test is not correct: What is the meaning of the following line? sys.exit(traceback.tb_lineno(sys.exc_info()[2])) (the module "traceback" has no attribute "tp_lineno") I presume that you intended something like: traceback.print_exc() sys.exit(sys.exc_info()[2].tb_lineno) and test at some point that "process.returncode == lineno"
msg74497 - (view)	Author: STINNER Victor (vstinner) *	Date: 2008-10-07 23:22
@amaury: Ok, I added your long comment in tokenizer.c. You're also right about the strange code in the test. I reused ocean-city's test. "sys.exc_info()[2].tb_lineno" raises an additional (useless) error. So I simplified the code to use only "raise RuntimeError(...)" with the try/except/else. Since tokenizer.c is hard to understand, I don't wnat to change the code of tok_nextc().
msg74498 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-10-07 23:43
This issue depends on #3975 to properly display tracebacks from python files with encoding.
msg74611 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-10-09 23:45
Committed r66867. I had to considerably change the unit tests, because the subprocess output is not utf-8 encoded; it's not even the same as sys.stdout, because the spawned process uses a PIPE, not a terminal: on my winXP, the main interpreter uses cp437, but the subprocess says cp1252. So I first run a 'python -c "print(sys.stdout.encoding)"' in the same conditions just to retrieve the encoding. fun fun. I hope this still works on Unixes, will watch the buildbots.

History
Date	User	Action	Args
2022-04-11 14:56:32	admin	set	github: 46637
2008-10-09 23:52:49	amaury.forgeotdarc	set	status: pending -> closed
2008-10-09 23:45:24	amaury.forgeotdarc	set	status: open -> pending resolution: fixed messages: + msg74611
2008-10-07 23:44:00	amaury.forgeotdarc	set	dependencies: + PyTraceBack_Print() doesn't respect # coding: xxx header messages: + msg74498
2008-10-07 23:23:48	vstinner	set	files: + tokenizer-coding-4.patch
2008-10-07 23:23:18	vstinner	set	files: - traceback_unicode-4.patch
2008-10-07 23:22:27	vstinner	set	files: - test_traceback-gbk.patch
2008-10-07 23:22:16	vstinner	set	files: - tokenizer-coding-3.patch
2008-10-07 23:22:12	vstinner	set	files: - tokenizer-coding-2.patch
2008-10-07 23:22:06	vstinner	set	files: + traceback_unicode-4.patch messages: + msg74497
2008-10-07 10:49:47	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg74435
2008-10-07 08:21:16	vstinner	link	issue3574 dependencies
2008-10-06 21:53:32	vstinner	set	files: + tokenizer-coding-3.patch messages: + msg74394
2008-09-27 15:32:44	benjamin.peterson	link	issue2832 superseder
2008-09-27 15:31:12	vstinner	set	messages: + msg73929
2008-09-27 14:52:59	vstinner	set	files: - tokenizer-coding.patch
2008-09-26 21:20:14	vstinner	set	files: + tokenizer-coding-2.patch messages: + msg73889
2008-09-26 20:24:44	ocean-city	set	messages: + msg73883
2008-09-26 16:11:50	vstinner	set	messages: + msg73857
2008-09-26 15:13:51	vstinner	set	files: + tokenizer-coding.patch messages: + msg73852
2008-09-26 13:06:59	vstinner	set	messages: + msg73847
2008-09-26 12:51:41	vstinner	set	files: + test_traceback-gbk.patch messages: + msg73846
2008-09-26 12:21:38	vstinner	set	messages: + msg73845
2008-09-26 11:53:01	pitrou	set	nosy: + pitrou messages: + msg73843
2008-09-26 11:52:39	pitrou	set	nosy: + vstinner
2008-09-26 11:52:29	pitrou	link	issue3973 superseder
2008-09-21 16:14:49	jmfauth	set	messages: + msg73512
2008-08-30 04:47:10	dlitz	set	messages: + msg72183
2008-08-26 16:45:38	dlitz	set	nosy: + dlitz
2008-08-18 14:51:07	jmfauth	set	nosy: + jmfauth messages: + msg71336
2008-08-18 01:29:35	barry	set	priority: release blocker -> high messages: + msg71308
2008-07-31 02:15:11	benjamin.peterson	set	priority: critical -> release blocker
2008-06-11 12:16:06	barry	set	priority: release blocker -> critical nosy: + barry messages: + msg67953
2008-05-18 20:00:32	georg.brandl	set	priority: release blocker
2008-04-05 05:25:28	ocean-city	set	files: + test_traceback.patch keywords: + patch messages: + msg64965
2008-03-20 06:15:10	ocean-city	set	type: behavior messages: + msg64157
2008-03-18 07:28:38	ocean-city	create