Issue 14990: detect_encoding should fail with SyntaxError on invalid encoding

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/59195

classification

Title:	detect_encoding should fail with SyntaxError on invalid encoding
Type:	behavior	Stage:	resolved
Components:	Library (Lib), Unicode	Versions:	Python 3.2, Python 3.3

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	ezio.melotti, flox, python-dev, vstinner
Priority:	normal	Keywords:	patch

Created on 2012-06-03 10:29 by flox, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
issue14990_detect_encoding.diff	flox, 2012-06-03 10:31		review

Pull Requests
URL	Status	Linked	Edit
PR 6572	closed	lukasz.langa, 2018-04-23 01:09

Messages (7)
msg162205 - (view)	Author: Florent Xicluna (flox) *	Date: 2012-06-03 10:29
I've hit this issue while playing with tokenize for the pep8.py module. The tokenize detect_encoding() should report SyntaxError when the encoding is improperly declared. However it raises a LookupError in some cases. $ ./python -m tokenize Lib/test/bad_coding2.py unexpected error: unknown encoding: utf8-sig Traceback (most recent call last): File "./Lib/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "./Lib/runpy.py", line 75, in _run_code exec(code, run_globals) File "./Lib/tokenize.py", line 686, in <module> main() File "./Lib/tokenize.py", line 656, in main tokens = list(tokenize(f.readline)) File "./Lib/tokenize.py", line 489, in _tokenize line = line.decode(encoding) LookupError: unknown encoding: utf8-sig
msg162206 - (view)	Author: Florent Xicluna (flox) *	Date: 2012-06-03 10:31
This patch seems to fix the issue.
msg162303 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-06-04 23:06
The patch is correct according to the PEP 263: If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an error. The fix should also be applied to 3.2. (Note: Python 3.1 doesn't accept bugfixes anymore.)
msg162428 - (view)	Author: Florent Xicluna (flox) *	Date: 2012-06-06 23:11
It should raise a SyntaxError, if coding is 'utf8'. I don't agree with the last patch proposed. If the import report a SyntaxError, 'tokenize' should do the same. $ ./python Lib/test/bad_coding2.py File "Lib/test/bad_coding2.py", line 1 SyntaxError: encoding problem: utf-8 and it complies strictly with PEP263.
msg162429 - (view)	Author: STINNER Victor (vstinner) *	Date: 2012-06-06 23:13
Oops, I didn't want to attach my patch to the issue. Mine is wrong, whereas yours is the right fix :-)
msg164811 - (view)	Author: Roundup Robot (python-dev)	Date: 2012-07-07 10:27
New changeset 5020afc0b7c9 by Florent Xicluna in branch '3.2': Issue #14990: tokenize: correctly fail with SyntaxError on invalid encoding declaration. http://hg.python.org/cpython/rev/5020afc0b7c9
msg164812 - (view)	Author: Florent Xicluna (flox) *	Date: 2012-07-07 10:29
Thanks. Fixed in trunk too, changeset b4322ad1fec4

History
Date	User	Action	Args
2022-04-11 14:57:31	admin	set	github: 59195
2018-04-23 01:09:05	lukasz.langa	set	pull_requests: + pull_request6272
2012-07-07 10:29:50	flox	set	status: open -> closed resolution: fixed messages: + msg164812 stage: patch review -> resolved
2012-07-07 10:27:14	python-dev	set	nosy: + python-dev messages: + msg164811
2012-06-06 23:13:55	vstinner	set	messages: + msg162429
2012-06-06 23:13:32	vstinner	set	files: - detect_encoding.patch
2012-06-06 23:11:30	flox	set	messages: + msg162428
2012-06-04 23:06:01	vstinner	set	files: + detect_encoding.patch versions: - Python 3.1 nosy: + ezio.melotti, vstinner messages: + msg162303 components: + Unicode
2012-06-03 10:31:05	flox	set	files: + issue14990_detect_encoding.diff keywords: + patch messages: + msg162206 stage: needs patch -> patch review
2012-06-03 10:29:02	flox	create