Issue 10509: PyTokenizer_FindEncoding can lead to a segfault if bad characters are found

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54718

classification

Title:	PyTokenizer_FindEncoding can lead to a segfault if bad characters are found
Type:	crash	Stage:	patch review
Components:	Interpreter Core	Versions:	Python 3.1, Python 3.2

process

Status:	closed	Resolution:	duplicate
Dependencies:		Superseder:
Assigned To:		Nosy List:	Trundle, ezio.melotti, ron_adam
Priority:	normal	Keywords:	patch

Created on 2010-11-22 18:37 by Trundle, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
PyTokenizer_FindEncoding_fix.patch	Trundle, 2010-11-22 18:37

Messages (3)
msg122153 - (view)	Author: Andreas Stührk (Trundle) *	Date: 2010-11-22 18:37
If a non-ascii character is found and there isn't an encoding cookie, a SyntaxError is raised (in `decoding_fgets`) that includes the path of the file (using ``tok->filename``), but that path is never set. You can easily reproduce the crash by calling `imp.find_module("badsyntax")`, where "badsyntax" is a Python file containing a non-ascii character (see e.g. the attached unit test), as `find_module` uses `PyTokenizer_FindEncoding`. Note that Python 3.1 uses `snprintf()` for formatting the error message and some implementations of `snprintf()` explicitly check for null pointers, hence it might not crash. One possible fix is to set ``tok->filename`` to something like "<unknown>". Attached is a patch which does that and adds an unit test for imp.
msg122184 - (view)	Author: Ron Adam (ron_adam) *	Date: 2010-11-23 01:26
Is this a duplicate of issue 9319?
msg124028 - (view)	Author: Andreas Stührk (Trundle) *	Date: 2010-12-15 16:38
Yes, it is (at the latest since msg124018).

History
Date	User	Action	Args
2022-04-11 14:57:09	admin	set	github: 54718
2010-12-15 16:38:37	Trundle	set	status: open -> closed messages: + msg124028 resolution: duplicate nosy: ron_adam, ezio.melotti, Trundle
2010-11-23 01:26:05	ron_adam	set	nosy: + ron_adam messages: + msg122184
2010-11-22 18:39:27	ezio.melotti	set	nosy: + ezio.melotti stage: patch review
2010-11-22 18:37:38	Trundle	set	files: + PyTokenizer_FindEncoding_fix.patch keywords: + patch
2010-11-22 18:37:11	Trundle	create