This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: tokenize.detect_encoding() and Mac newline
Type: Stage:
Components: Library (Lib) Versions: Python 3.0
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, brett.cannon, vstinner
Priority: normal Keywords: patch

Created on 2008-11-21 12:32 by vstinner, last changed 2022-04-11 14:56 by admin. This issue is now closed.

File name Uploaded Description Edit
detect_encoding_mac_newlines.patch vstinner, 2010-03-04 01:07
Messages (4)
msg76176 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-11-21 12:32
I'm trying to fix IDLE to support Unicode (#4008 and #4323). Instead 
of IDLE builtin charset detection, I tried to use 
tokenize.detect_encoding() but this function doesn't work with script 
using Mac new line (b"\r").

Code to detect the encoding of a Python script:
def pythonEncoding(filename):
   with open(filename, 'rb') as fp:
      encoding, lines = detect_encoding(fp.readline)
   return encoding

Example to reproduce the problem with Mac script:
fp = BytesIO(b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 
encoding, lines = detect_encoding(fp.readline)
print(encoding, lines)

=> Result: utf-8 [b'# coding: ISO-8859-1\rprint("Bonjour ma ch\xe8re 

The problem occurs at "line_string = line.decode('ascii')". 
Since "line" contains a non-ASCII character (b"\xe8"), the conversion 
msg84126 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-03-24 23:44
See also related issue: #4628 (No universal newline support for 
compile() when using bytes).
msg100364 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-03-04 01:07
I finally wrote a patch using a small generator based on the .splitlines(1) method. Patch includes a test.
msg112017 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-07-29 22:18
Well, it looks like nobody cares (including me), so I close this issue.
Date User Action Args
2022-04-11 14:56:41adminsetgithub: 48627
2010-07-29 22:18:49vstinnersetstatus: open -> closed
resolution: wont fix
messages: + msg112017
2010-03-04 01:08:07vstinnersetnosy: + brett.cannon, benjamin.peterson
2010-03-04 01:07:29vstinnersetfiles: + detect_encoding_mac_newlines.patch
keywords: + patch
messages: + msg100364
2009-03-24 23:44:06vstinnersetmessages: + msg84126
2008-11-21 12:32:17vstinnercreate