This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: add an optional "default" argument to tokenize.detect_encoding
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, berker.peksag, eric.araujo, flox, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2010-09-04 11:32 by flox, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
detect_encoding_default.diff flox, 2010-09-04 11:32 Patch, apply to 3.x review
Messages (3)
msg115567 - (view) Author: Florent Xicluna (flox) * (Python committer) Date: 2010-09-04 11:32
The function tokenize.detect_encoding() detects the encoding either in the coding cookie or in the BOM.  If no encoding is found, it returns 'utf-8':

When result is 'utf-8', there's no (easy) way to know if the encoding was really detected in the file, or if it falls back to the default value.

Cases (with utf-8):

 - UTF-8 BOM found, returns ('utf-8-sig', [])
 - cookie on 1st line, returns ('utf-8', [line1])
 - cookie on 2nd line, returns ('utf-8', [line1, line2])
 - no cookie found, returns ('utf-8', [line1, line2])


The proposal is to allow to call the function with a different default value (None or ''), in order to know if the encoding is really detected.

For example, this function could be used by the Tools/scripts/findnocoding.py script.

Patch attached.
msg122106 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-11-22 10:52
> no cookie found, returns ('utf-8', [line1, line2])

I never understood the usage of the second item. IMO it should be None if no cookie found.
msg173002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-15 20:55
> I never understood the usage of the second item. IMO it should be None if no cookie found.

UTF-8 is the default source encoding for Python 3.
History
Date User Action Args
2022-04-11 14:57:06adminsetgithub: 53980
2014-11-02 12:11:24berker.peksagsetnosy: + berker.peksag

versions: + Python 3.5, - Python 3.4
2012-10-15 20:55:02serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg173002
2012-07-21 13:19:57floxsetversions: + Python 3.4, - Python 3.3
2010-12-31 01:43:04eric.araujosetnosy: + eric.araujo

versions: + Python 3.3, - Python 3.2
2010-12-30 22:14:16georg.brandlunlinkissue7962 dependencies
2010-11-22 10:52:50vstinnersetmessages: + msg122106
2010-11-22 05:14:16eric.araujosetnosy: + vstinner
2010-09-04 18:54:27pitrousetnosy: + benjamin.peterson
2010-09-04 13:23:06floxlinkissue7962 dependencies
2010-09-04 11:32:08floxcreate