Title: PyTokenizer_FindEncoding can lead to a segfault if bad characters are found
Type: crash Stage: patch review
Components: Interpreter Core Versions: Python 3.1, Python 3.2
Status: closed Resolution: duplicate
Dependencies: Superseder:
Assigned To: Nosy List: Trundle, ezio.melotti, ron_adam
Priority: normal Keywords: patch

Created on 2010-11-22 18:37 by Trundle, last changed 2010-12-15 16:38 by Trundle. This issue is now closed.

File name Uploaded Description Edit
PyTokenizer_FindEncoding_fix.patch Trundle, 2010-11-22 18:37
Messages (3)
msg122153 - (view) Author: Andreas Stührk (Trundle) * Date: 2010-11-22 18:37
If a non-ascii character is found and there isn't an encoding cookie, a SyntaxError is raised (in `decoding_fgets`) that includes the path of the file (using ``tok->filename``), but that path is never set. You can easily reproduce the crash by calling `imp.find_module("badsyntax")`, where "badsyntax" is a Python file containing a non-ascii character (see e.g. the attached unit test), as `find_module` uses `PyTokenizer_FindEncoding`. Note that Python 3.1 uses `snprintf()` for formatting the error message and some implementations of `snprintf()` explicitly check for null pointers, hence it might not crash.

One possible fix is to set ``tok->filename`` to something like "<unknown>". Attached is a patch which does that and adds an unit test for imp.
msg122184 - (view) Author: Ron Adam (ron_adam) * Date: 2010-11-23 01:26
Is this a duplicate of issue 9319?
msg124028 - (view) Author: Andreas Stührk (Trundle) * Date: 2010-12-15 16:38
Yes, it is (at the latest since msg124018).
Date User Action Args
2010-12-15 16:38:37Trundlesetstatus: open -> closed

messages: + msg124028
resolution: duplicate
nosy: ron_adam, ezio.melotti, Trundle
2010-11-23 01:26:05ron_adamsetnosy: + ron_adam
messages: + msg122184
2010-11-22 18:39:27ezio.melottisetnosy: + ezio.melotti

stage: patch review
2010-11-22 18:37:38Trundlesetfiles: + PyTokenizer_FindEncoding_fix.patch
keywords: + patch
2010-11-22 18:37:11Trundlecreate