Issue 10778: decoding_fgets() (tokenizer.c) decodes the filename from the wrong encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54987

classification

Title:	decoding_fgets() (tokenizer.c) decodes the filename from the wrong encoding
Type:		Stage:
Components:	Interpreter Core, Unicode	Versions:	Python 3.2

process

Created on 2010-12-27 01:57 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (4)
msg124693 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-27 01:57
decoding_fgets() decodes the input filename from UTF-8 whereas the filename is encoded to the filesystem encoding. PyUnicode_DecodeFSDefault() should be used. decoding_fgets() raises a SyntaxError("Non-UTF-8 code starting with '\xHH' in file xxx on line xxx, but no encoding declared; ..."). indenterror() (inconsistent use of tabs and spaces in indentation) and
msg124695 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-27 02:06
See also issue #10779 (Change filename encoding to FS encoding in PyErr_WarnExplicit()).
msg124703 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-27 03:02
Oh, ignore "indenterror() (inconsistent use of tabs and spaces in indentation) and", I forgot to remove it. indenterror() is correct.
msg124731 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-27 20:12
Fixed by r87518.

History
Date	User	Action	Args
2022-04-11 14:57:10	admin	set	github: 54987
2010-12-27 20:12:31	vstinner	set	status: open -> closed messages: + msg124731 resolution: fixed
2010-12-27 03:02:40	vstinner	set	messages: + msg124703
2010-12-27 02:06:53	vstinner	set	messages: + msg124695
2010-12-27 01:57:01	vstinner	create