This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Trundle
Recipients Trundle
Date 2010-11-22.18:37:11
SpamBayes Score 2.41669e-10
Marked as misclassified No
Message-id <1290451032.87.0.502665724256.issue10509@psf.upfronthosting.co.za>
In-reply-to
Content
If a non-ascii character is found and there isn't an encoding cookie, a SyntaxError is raised (in `decoding_fgets`) that includes the path of the file (using ``tok->filename``), but that path is never set. You can easily reproduce the crash by calling `imp.find_module("badsyntax")`, where "badsyntax" is a Python file containing a non-ascii character (see e.g. the attached unit test), as `find_module` uses `PyTokenizer_FindEncoding`. Note that Python 3.1 uses `snprintf()` for formatting the error message and some implementations of `snprintf()` explicitly check for null pointers, hence it might not crash.

One possible fix is to set ``tok->filename`` to something like "<unknown>". Attached is a patch which does that and adds an unit test for imp.
History
Date User Action Args
2010-11-22 18:37:12Trundlesetrecipients: + Trundle
2010-11-22 18:37:12Trundlesetmessageid: <1290451032.87.0.502665724256.issue10509@psf.upfronthosting.co.za>
2010-11-22 18:37:11Trundlelinkissue10509 messages
2010-11-22 18:37:11Trundlecreate