classification
Title: tokenize.detect_encoding(): raise SyntaxError on codecs.lookup() error
Type: Stage:
Components: Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, vstinner
Priority: normal Keywords: needs review, patch

Created on 2008-10-02 23:00 by vstinner, last changed 2008-12-12 01:25 by benjamin.peterson. This issue is now closed.

Files
File name Uploaded Description Edit
tokenize_detect_encoding.patch vstinner, 2008-10-02 23:00
tokenize_detect_encoding-2.patch vstinner, 2008-12-11 23:20
Messages (4)
msg74205 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-10-02 23:00
tokenize.detect_encoding() raises a LookupError() if the charset is 
unknown whereas Python raises a SyntaxError. So this patch mimics 
Python behaviour for tokenize module.

Extra: reuse BOM_UTF8 from the codecs module.
msg74217 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-10-02 23:49
This patch seems good to me, it fixes the following test:

Index: Lib/test/test_tokenize.py
===================================================================
--- Lib/test/test_tokenize.py	(revision 66701)
+++ Lib/test/test_tokenize.py	(working copy)
@@ -795,6 +795,8 @@
         self.assertEquals(encoding, 'utf-8')
         self.assertEquals(consumed_lines, [])
 
+        readline = self.get_readline((b'# coding: bad\n',))
+        self.assertRaises(SyntaxError, detect_encoding, readline)
 
 class TestTokenize(TestCase):
msg77642 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-12-11 23:20
New version of the patch:
 - remove utf8_bom (was already replaced by codecs.BOM_UTF8)
 - include the regression test from amaury.forgeotdarc

Can anyone review the new patch (which is very similar to the first 
one) and commit it?
msg77652 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-12-12 01:25
Fixed in r67711.
History
Date User Action Args
2008-12-12 01:25:48benjamin.petersonsetstatus: open -> closed
keywords: patch, patch, needs review
resolution: fixed
messages: + msg77652
nosy: + benjamin.peterson
2008-12-11 23:21:00vstinnersetkeywords: patch, patch, needs review
files: + tokenize_detect_encoding-2.patch
messages: + msg77642
2008-10-02 23:49:03amaury.forgeotdarcsetkeywords: + needs review
nosy: + amaury.forgeotdarc
messages: + msg74217
2008-10-02 23:00:32vstinnercreate