msg63905 - (view) |
Author: Hirokazu Yamamoto (ocean-city) * |
Date: 2008-03-18 07:28 |
# This issue inherits from issue2301.
If there is "# coding: ????" is in source code and
coding is neigher utf-8 nor iso-8859-1, line number (tok->lineno)
becomes wrong.
Please look into Parser/tokenizer.c. In this case,
tok->decoding_state becomes STATE_NORMAL, so fp_setreadl
newly opens file but *doesn't* seek to current position.
(Or maybe can we reuse already opened file?)
So
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6
outputs
C:\Documents and Settings\WhiteRabbit>py3k ascii.py
Traceback (most recent call last):
File "ascii.py", line 6, in <module>
# 4
RuntimeError: a
[22821 refs]
One line shifted because line number wrongly +1
And
# dummy
# coding: ascii
# 1
# 2
# 3
raise RuntimeError("a")
# 4
# 5
# 6
outputs
C:\Documents and Settings\WhiteRabbit>py3k ascii.py
Traceback (most recent call last):
File "ascii.py", line 8, in <module>
# 5
RuntimeError: a
[22821 refs]
Two lines shifted because line number wrongly +2
|
msg64157 - (view) |
Author: Hirokazu Yamamoto (ocean-city) * |
Date: 2008-03-20 06:15 |
Following dirty hack workarounds this bug. Comment of this function
says not ascii compatible encoding is not supported yet, (ie: UTF-16)
so probably this works.
Index: Parser/tokenizer.c
===================================================================
--- Parser/tokenizer.c (revision 61632)
+++ Parser/tokenizer.c (working copy)
@@ -464,6 +464,7 @@
Py_XDECREF(tok->decoding_readline);
readline = PyObject_GetAttrString(stream, "readline");
tok->decoding_readline = readline;
+ tok->lineno = -1; /* dirty hack */
cleanup:
Py_XDECREF(stream);
But if multibyte character is in line like this, its line will not be
printed.
# coding: cp932
# 1
raise RuntimeError("あいうえお")
# 2
C:\Documents and Settings\WhiteRabbit>py3k cp932.py
Traceback (most recent call last):
File "cp932.py", line 3, in <module>
[22819 refs]
This is because Python/trackeback.c 's tb_displayline() assumes
input line is encoded with UTF-8. (simply using FILE structure +
Py_UniversalNewlineFgets)
# http://mail.python.org/pipermail/python-3000/2008-March/012546.html
# sounds nice, if we can replace all FILE structure to Python's own
# fast enough codeced Reader or something.
|
msg64965 - (view) |
Author: Hirokazu Yamamoto (ocean-city) * |
Date: 2008-04-05 05:25 |
I've written testcase for lineno problem.
|
msg67953 - (view) |
Author: Barry A. Warsaw (barry) * |
Date: 2008-06-11 12:16 |
This is a bug and not a new feature, so it could go in after beta. I'm
knocking it down to a critical.
|
msg71308 - (view) |
Author: Barry A. Warsaw (barry) * |
Date: 2008-08-18 01:29 |
While this is a bug, it's not serious enough to hold up the release.
|
msg71336 - (view) |
Author: jmf (jmfauth) |
Date: 2008-08-18 14:51 |
Py3.0b2. This bug seems to be quite annoying. Especially when one works
with a main module importing modules which are importing modules and so
on, all modules having an encoding declaration. The Traceback (and the
user) is (are) a little bit lost.
---------
# -*- coding: cp1252 -*-
# modb.py
def fb():
i = 1
j = 0
r = i / j
-----------
# -*- coding: cp1252 -*-
# moda.py
import modb
def fa():
modb.fb()
-----------
# -*- coding: cp1252 -*-
# main.py
import moda
def main():
moda.fa()
if __name__ == '__main__':
main()
-----------
Running main.py leads to an
>c:\python30\pythonw -u "main.py"
(Traceback (most recent call last):
File "main.py", line 11, in <module>
File "main.py", line 8, in main
File "C:\jm\jmpy3\moda.py", line 8, in fa
File "C:\jm\jmpy3\modb.py", line 8, in fb
ZeroDivisionError: int division or modulo by zero
>Exit code: 1
|
msg72183 - (view) |
Author: Dwayne Litzenberger (dlitz) |
Date: 2008-08-30 04:47 |
Could "-*- coding: ascii -*-" and other equivalent encodings be fixed,
at least, before the release?
|
msg73512 - (view) |
Author: jmf (jmfauth) |
Date: 2008-09-21 16:14 |
Python 3.0rc1
If the lines are now displayed correctly, I think there is still a
numbering issue, a +1 offset.
Python 2.5.2
# -*- coding: cp1252 -*- <<<< line 1, first line
s = 'abc'
import dummy
s = 'def'
---
>pythonw -u "testpy2.py"
Traceback (most recent call last):
File "testpy2.py", line 4, in <module>
import dummy
ImportError: No module named dummy
>Exit code: 1
Python 3.0rc1
# -*- coding: cp1252 -*-
s = 'abc'
import dummy
s = 'def'
---
>c:\python30\pythonw -u "testpy3.py"
Traceback (most recent call last):
File "testpy3.py", line 5, in <module>
s = 'def'
ImportError: No module named dummy
>Exit code: 1
|
msg73843 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2008-09-26 11:53 |
#3973 is a duplicate.
|
msg73845 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 12:21 |
By setting lineto to 1 (as proposed ocean-city), ASCII tests (test1
and test2, see below) works correctly. This change doesn't impact
utf-8/iso-8859-1 charset (it's special case).
--- test1 ---
# coding: ASCII
raise Exception("here")
-------------
--- test2 ---
# useless at line 1
# coding: ASCII
raise Exception("here")
-------------
I don't know how to test a UTF-16 file. Can someone write a testcase?
|
msg73846 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 12:51 |
ocean-city testcase is invalid: it uses subprocess.call() which
returns the exit code, not the Python error line number! Here is a
better testcase using subprocess.Popen() checking the line number but
also the display line. It tests ASCII, UTF-8 and GBK charsets. Using
GBK charset, you get the bug described by ocean-city (problem with
multibyte charset). My testcase takes also care of script with #
coding at the second line.
|
msg73847 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 13:06 |
Hum, about the empty line error using a multibyte charset, the issue
is different. PyTraceBack_Print() calls _Py_DisplaySourceLine() which
doesn't take care of the charset.
|
msg73852 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 15:13 |
Here is a patch fixing this issue: it's quite the same that ocean-city
patch, but I prefer to patch lineno only if set_readline() succeed.
About the truncated traceback for multibyte charset: see the new
issue3975.
|
msg73857 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 16:11 |
Oh! My patch breaks "python -m". The problem is maybe no in the token
parser but... somewhere else?
--- test.py ---
# coding: ASCII
raise Exception("line 2")
# try again!
---------------
Python 3.0 trunk unpatched:
---
$ ./python test.py
Traceback (most recent call last):
File "test.py", line 3, in <module>
$ ./python -m test
Traceback (most recent call last):
File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in
_run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
exec(code, run_globals)
File "/home/haypo/prog/py3k/test.py", line 2, in <module>
raise Exception("line 2")
Exception: line 2
---
Python 3.0 trunk + tokenizer-coding.patch:
---
marge$ ./python test.py
Traceback (most recent call last):
File "test.py", line 2, in <module>
raise Exception("line 2")
Exception: line 2
Traceback (most recent call last):
File "/home/haypo/prog/py3k/Lib/runpy.py", line 121, in
_run_module_as_main
"__main__", fname, loader, pkg_name)
File "/home/haypo/prog/py3k/Lib/runpy.py", line 34, in _run_code
exec(code, run_globals)
File "/home/haypo/prog/py3k/test.py", line 1, in <module>
# coding: ASCII
Exception: line 2
---
|
msg73883 - (view) |
Author: Hirokazu Yamamoto (ocean-city) * |
Date: 2008-09-26 20:24 |
Victor, this is fp_setreadl's problem, so if put "tok->lineno = -1"
anywhere, it should be in fp_setreadl(), I think.
r = set_readline(tok, cs);
if (r) {
/* 1 */
tok->encoding = cs;
tok->decoding_state = STATE_NORMAL;
At /* 1 */, set_readline could be buf_setreadl(), and fp_setreadl is
called elsewhere.
|
msg73889 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-26 21:20 |
@ocean-city: Oops, sorry. Using your patch (set lineno in
fp_setreadl()), it works on both cases ("python test.py" or "python -m
test").
The new patch includes your fix for tokenizer.c and a new version of the
testcase.
|
msg73929 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-09-27 15:31 |
Issue 2832 is a duplicate.
|
msg74394 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-10-06 21:53 |
benjamin was afraid by the comment /* dirty hack */ in my previous
comment. After reading tokenizer.c again and again, I can say that the
fix is correct: the file is closed and then re-opened by fp_setreadl()
(using io.open()), and so the file cursor goes back to the file start.
|
msg74435 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-10-07 10:49 |
Your patch does the correct thing, however an explanation of the -1
value would be welcome. Something like:
/* The file has been reopened; parsing will restart from
* the beginning of the file, we have to reset the line number.
* But this function has been called from inside tok_nextc() which
* will increment lineno before it returns. So we set it -1 so that
* the next call to tok_nextc() will start with tok->lineno == 0.
*/
Or we could change the place of the tok->lineno++ in tok_nextc() so that
it is called before the call to decoding_fgets(); other changes will be
needed.
Then, I think that your test is not correct: What is the meaning of the
following line?
sys.exit(traceback.tb_lineno(sys.exc_info()[2]))
(the module "traceback" has no attribute "tp_lineno")
I presume that you intended something like:
traceback.print_exc()
sys.exit(sys.exc_info()[2].tb_lineno)
and test at some point that "process.returncode == lineno"
|
msg74497 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2008-10-07 23:22 |
@amaury: Ok, I added your long comment in tokenizer.c. You're also
right about the strange code in the test. I reused ocean-city's
test. "sys.exc_info()[2].tb_lineno" raises an additional (useless)
error. So I simplified the code to use only "raise RuntimeError(...)"
with the try/except/else.
Since tokenizer.c is hard to understand, I don't wnat to change the
code of tok_nextc().
|
msg74498 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-10-07 23:43 |
This issue depends on #3975 to properly display tracebacks from python
files with encoding.
|
msg74611 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * |
Date: 2008-10-09 23:45 |
Committed r66867.
I had to considerably change the unit tests, because the subprocess
output is not utf-8 encoded; it's not even the same as sys.stdout,
because the spawned process uses a PIPE, not a terminal: on my winXP,
the main interpreter uses cp437, but the subprocess says cp1252. So I
first run a 'python -c "print(sys.stdout.encoding)"' in the same
conditions just to retrieve the encoding. fun fun.
I hope this still works on Unixes, will watch the buildbots.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:32 | admin | set | github: 46637 |
2008-10-09 23:52:49 | amaury.forgeotdarc | set | status: pending -> closed |
2008-10-09 23:45:24 | amaury.forgeotdarc | set | status: open -> pending resolution: fixed messages:
+ msg74611 |
2008-10-07 23:44:00 | amaury.forgeotdarc | set | dependencies:
+ PyTraceBack_Print() doesn't respect # coding: xxx header messages:
+ msg74498 |
2008-10-07 23:23:48 | vstinner | set | files:
+ tokenizer-coding-4.patch |
2008-10-07 23:23:18 | vstinner | set | files:
- traceback_unicode-4.patch |
2008-10-07 23:22:27 | vstinner | set | files:
- test_traceback-gbk.patch |
2008-10-07 23:22:16 | vstinner | set | files:
- tokenizer-coding-3.patch |
2008-10-07 23:22:12 | vstinner | set | files:
- tokenizer-coding-2.patch |
2008-10-07 23:22:06 | vstinner | set | files:
+ traceback_unicode-4.patch messages:
+ msg74497 |
2008-10-07 10:49:47 | amaury.forgeotdarc | set | nosy:
+ amaury.forgeotdarc messages:
+ msg74435 |
2008-10-07 08:21:16 | vstinner | link | issue3574 dependencies |
2008-10-06 21:53:32 | vstinner | set | files:
+ tokenizer-coding-3.patch messages:
+ msg74394 |
2008-09-27 15:32:44 | benjamin.peterson | link | issue2832 superseder |
2008-09-27 15:31:12 | vstinner | set | messages:
+ msg73929 |
2008-09-27 14:52:59 | vstinner | set | files:
- tokenizer-coding.patch |
2008-09-26 21:20:14 | vstinner | set | files:
+ tokenizer-coding-2.patch messages:
+ msg73889 |
2008-09-26 20:24:44 | ocean-city | set | messages:
+ msg73883 |
2008-09-26 16:11:50 | vstinner | set | messages:
+ msg73857 |
2008-09-26 15:13:51 | vstinner | set | files:
+ tokenizer-coding.patch messages:
+ msg73852 |
2008-09-26 13:06:59 | vstinner | set | messages:
+ msg73847 |
2008-09-26 12:51:41 | vstinner | set | files:
+ test_traceback-gbk.patch messages:
+ msg73846 |
2008-09-26 12:21:38 | vstinner | set | messages:
+ msg73845 |
2008-09-26 11:53:01 | pitrou | set | nosy:
+ pitrou messages:
+ msg73843 |
2008-09-26 11:52:39 | pitrou | set | nosy:
+ vstinner |
2008-09-26 11:52:29 | pitrou | link | issue3973 superseder |
2008-09-21 16:14:49 | jmfauth | set | messages:
+ msg73512 |
2008-08-30 04:47:10 | dlitz | set | messages:
+ msg72183 |
2008-08-26 16:45:38 | dlitz | set | nosy:
+ dlitz |
2008-08-18 14:51:07 | jmfauth | set | nosy:
+ jmfauth messages:
+ msg71336 |
2008-08-18 01:29:35 | barry | set | priority: release blocker -> high messages:
+ msg71308 |
2008-07-31 02:15:11 | benjamin.peterson | set | priority: critical -> release blocker |
2008-06-11 12:16:06 | barry | set | priority: release blocker -> critical nosy:
+ barry messages:
+ msg67953 |
2008-05-18 20:00:32 | georg.brandl | set | priority: release blocker |
2008-04-05 05:25:28 | ocean-city | set | files:
+ test_traceback.patch keywords:
+ patch messages:
+ msg64965 |
2008-03-20 06:15:10 | ocean-city | set | type: behavior messages:
+ msg64157 |
2008-03-18 07:28:38 | ocean-city | create | |