This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt has a UTF8 BOM signature
Type: behavior Stage: resolved
Components: Versions: Python 3.2, Python 3.3
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: ned.deily Nosy List: ned.deily, nneonneo, python-dev, trent
Priority: normal Keywords: patch

Created on 2011-07-19 21:18 by nneonneo, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue12587.patch nneonneo, 2011-07-19 22:41 Patch to fix the issue review
Messages (6)
msg140694 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 21:18
From a fresh Python3.2.1 tarball:

nneonneo@nneonneo-mbp:~/devel/Python-3.Lib/test$ for i in tokenize_tests-*; do echo $i; xxd $i | head -n 1; done
tokenize_tests-latin1-coding-cookie-and-utf8-bom-sig.txt
0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a  ...# -*- coding:
tokenize_tests-no-coding-cookie-and-utf8-bom-sig-only.txt
0000000: efbb bf23 2049 4d50 4f52 5441 4e54 3a20  ...# IMPORTANT: 
tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt
0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a  ...# -*- coding:
tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt
0000000: efbb bf23 202d 2a2d 2063 6f64 696e 673a  ...# -*- coding:

From this, it appears that the file called "tokenize_tests-utf8-coding-cookie-and-no-utf8-bom-sig.txt" actually has a UTF-8 BOM signature, which means either the comment is lying or the BOM was accidentally added to the test file at some point.
msg140699 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-19 22:04
It looks like a BOM has been present in that file for a *long* time: it is there in the Python 3.0 source tarball, and, according to the converted svn-to-hg history, it was there in its original check-in and is still there in the current development tip.
msg140702 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 22:34
Yes, it seems that way. Then the question is: why does the comment claim that it doesn't have a BOM?

Also, test_tokenize.py is wrong around line 651:

    def test_utf8_coding_cookie_and_no_utf8_bom(self):
        f = 'tokenize_tests-utf8-coding-cookie-and-utf8-bom-sig.txt'
        self.assertTrue(self._testFile(f))

It reads the wrong file in this case, judging by the testcase name. (This makes it a duplicate of the test_utf8_coding_cookie_and_utf8_bom case)
msg140704 - (view) Author: Robert Xiao (nneonneo) * Date: 2011-07-19 22:41
Attached is a patch which fixes this. Python 3.2.1 still passes the test after applying the patch, as expected.
msg140707 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2011-07-19 23:19
New changeset 0c254698e0ed by Ned Deily in branch '3.2':
Issue #12587: Correct faulty test file and reference in test_tokenize.
http://hg.python.org/cpython/rev/0c254698e0ed

New changeset c1d2b6b337c5 by Ned Deily in branch 'default':
Issue #12587: Correct faulty test file and reference in test_tokenize.
http://hg.python.org/cpython/rev/c1d2b6b337c5
msg140709 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-19 23:21
Thanks for the report and the patch!  Applied to 3.2 (for 3.2.2) and default (for 3.3).
History
Date User Action Args
2022-04-11 14:57:19adminsetgithub: 56796
2011-07-19 23:21:48ned.deilysetstatus: open -> closed
messages: + msg140709

assignee: ned.deily
resolution: fixed
stage: needs patch -> resolved
2011-07-19 23:19:07python-devsetnosy: + python-dev
messages: + msg140707
2011-07-19 22:41:18nneonneosetfiles: + issue12587.patch
keywords: + patch
messages: + msg140704
2011-07-19 22:34:01nneonneosetmessages: + msg140702
2011-07-19 22:04:35ned.deilysetversions: + Python 3.3
nosy: + trent, ned.deily

messages: + msg140699

stage: needs patch
2011-07-19 21:18:59nneonneocreate