Message 115567 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	flox
Recipients	flox
Date	2010-09-04.11:32:06
SpamBayes Score	4.280744e-07
Marked as misclassified	No
Message-id	<1283599930.65.0.476914851505.issue9771@psf.upfronthosting.co.za>
In-reply-to

Content
The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM. If no encoding is found, it returns 'utf-8': When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value. Cases (with utf-8): - UTF-8 BOM found, returns ('utf-8-sig', []) - cookie on 1st line, returns ('utf-8', [line1]) - cookie on 2nd line, returns ('utf-8', [line1, line2]) - no cookie found, returns ('utf-8', [line1, line2]) The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected. For example, this function could be used by the Tools/scripts/findnocoding.py script. Patch attached.

The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM.  If no encoding is found, it returns 'utf-8':

When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value.

Cases (with utf-8):

 - UTF-8 BOM found, returns ('utf-8-sig', [])
 - cookie on 1st line, returns ('utf-8', [line1])
 - cookie on 2nd line, returns ('utf-8', [line1, line2])
 - no cookie found, returns ('utf-8', [line1, line2])


The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected.

For example, this function could be used by the Tools/scripts/findnocoding.py script.

Patch attached.

History
Date	User	Action	Args
2010-09-04 11:32:10	flox	set	recipients: + flox
2010-09-04 11:32:10	flox	set	messageid: <1283599930.65.0.476914851505.issue9771@psf.upfronthosting.co.za>
2010-09-04 11:32:08	flox	link	issue9771 messages
2010-09-04 11:32:08	flox	create