This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: traceback presented in wrong encoding
Type: behavior Stage: patch review
Components: Interpreter Core Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, flox, jafo, r_mosaic, vstinner
Priority: normal Keywords: needs review, patch

Created on 2009-07-22 07:31 by r_mosaic, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
traceback-encoding.patch amaury.forgeotdarc, 2009-07-22 15:04
traceback-encoding-2.patch vstinner, 2010-06-17 21:11
Messages (10)
msg90803 - (view) Author: Fan Decheng (r_mosaic) Date: 2009-07-22 07:31
traceback information is wrongly encoded.
Steps to reproduce:
1. Use a version of Windows that supports CP936 (Simplified Chinese) as 
the default encoding.
2. Create a directory containing Chinese characters. Such as C:\测试
3. In the directory create a python file such as C:\测试\test.py
4. In the python file enter the following lines
import traceback
try:
    aaa # create a non-existent name
except Exception as ex:
    traceback.print_exc()
5. Run the program with this command line (remember to use full path to 
the test.py file):
C:\Python31\python.exe C:\测试\test.py
6. See the output.

Expected result:
There is correct output without encoding problems. Such as:

Traceback (most recent call last):
  File "C:\测试\test.py", line 3, in <module>
NameError: name 'aaa' is not defined

Actual result:
UTF-8 encoded string is decoded using CP936 so the output is incorrect.

Traceback (most recent call last):
  File "C:\娴嬭瘯\test.py", line 3, in <module>
NameError: name 'aaa' is not defined

Additional information:
In Python 3.0, such test would generate:
File "<decoding error>", line 221, in main
In Python 3.1, the test generates the output mentioned in the repro 
steps.  As I tried traceback.format_exc(), it seems the original 
characters 测试 have become three Unicode characters when returned by 
format_exc().
msg90815 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-07-22 15:04
This also happens on a Western Windows (cp437, mbcs==cp1252) with a
filename like "café.py".

The attached patch corrects three problems:

- in compile.c, the c_filename member has utf8 encoding, and must not be
decoded with PyUnicode_DecodeFSDefault. This is the reported issue.

- Same thing in pythonrun.c, if you want "print(__file__)" to work.

- in traceback.c, the content of the file is not shown.


Tested with this script:
=====================================================
print("file name:", __file__)
import traceback
try:
    aaa
except:
    traceback.print_exc()
    raise
=====================================================

The output should be:
=====================================================
file name: c:\temp\café.py
Traceback (most recent call last):
  File "c:\temp\café.py", line 4, in <module>
    aaa
NameError: name 'aaa' is not defined
Traceback (most recent call last):
  File "c:\temp\café.py", line 4, in <module>
    aaa
NameError: name 'aaa' is not defined
=====================================================
msg101481 - (view) Author: Sean Reifschneider (jafo) * (Python committer) Date: 2010-03-22 05:44
From a cursory glance, I don't see any problems with this patch.  Though I admit that I don't know the traceback code nearly as well as you, Amaury.  The tests pass on py3k trunk on my Linux box.

If you want other review, perhaps ask on python-dev?
msg101670 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-25 02:11
> in compile.c, the c_filename member has utf8 encoding

The problem is maybe that c_filename should be an unicode object created using the file system default encoding and the surrogateescape error handler, to be able to store undecodable filenames (useful on POSIX OS using a byte string API, eg. Linux).
msg101676 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-03-25 08:17
Storing unicode in c_filename would not solve the problem: "surrogateescape" characters are not printable.
There is no need to support non-decodable filenames in the import mechanism.
msg101755 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-26 16:01
> "surrogateescape" characters are not printable

stderr uses backslashescape error handler, and so non-decodable characters will be displayed as \xHH.

... see also #8092 :-)
msg107839 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-14 23:11
The patch changes the prototype of _Py_DisplaySourceLine() function. Is it possible that a third party module uses this function? Should we keep backward compatibility with third pary modules using the private C API?
msg107848 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-06-15 07:17
In issue3343, we chose to mark this function as private.
msg108063 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-17 21:11
Update and improve the patch:
 - Update the patch to py3k (replace tabs by spaces)
 - check if _PyUnicode_AsString() result is NULL
 - _Py_FindSourceFile() returns the file instead of NULL on success!
 - use directly "utf-8" instead of calling PyUnicode_GetDefaultEncoding() for the default source code encoding (which is constant)
 - use PyUnicode_FromFormat() instead of PyOS_snprintf() in tb_displayline() to avoid conversion from unicode to utf-8 and then convert utf-8 back to unicode (in PyFile_WriteString). name type is now PyObject*
 - reindent also PyTracebackObject structure in traceback.h, just because I hate tabs :-)
msg108070 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-06-17 23:24
I tested the last patch on Windows: it does fix the bug, the traceback is displayed correctly in my terminal charset (cp850).

I commited the fix to Python 3.1 (r82063) and 3.2 (r82059+r82061).
History
Date User Action Args
2022-04-11 14:56:51adminsetgithub: 50792
2010-06-17 23:24:01vstinnersetstatus: open -> closed
resolution: fixed
messages: + msg108070
2010-06-17 21:11:43vstinnersetfiles: + traceback-encoding-2.patch

messages: + msg108063
2010-06-15 07:17:32amaury.forgeotdarcsetmessages: + msg107848
2010-06-14 23:11:51vstinnersetmessages: + msg107839
2010-03-26 16:01:10vstinnersetmessages: + msg101755
2010-03-25 08:17:02amaury.forgeotdarcsetmessages: + msg101676
2010-03-25 02:11:26vstinnersetnosy: + vstinner
messages: + msg101670
2010-03-22 10:34:47floxsetnosy: + flox

stage: patch review
2010-03-22 05:45:02jafosetpriority: normal
2010-03-22 05:44:45jafosetnosy: + jafo
messages: + msg101481
2009-07-22 15:04:29amaury.forgeotdarcsetkeywords: + needs review
2009-07-22 15:04:19amaury.forgeotdarcsetfiles: + traceback-encoding.patch

nosy: + amaury.forgeotdarc
messages: + msg90815

keywords: + patch
2009-07-22 07:31:53r_mosaiccreate