Title: API to get source encoding as defined by PEP 263
Type: enhancement Stage:
Components: Interpreter Core, Library (Lib) Versions: Python 3.1, Python 3.2, Python 2.7
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, lemburg, loewis, srid
Priority: normal Keywords:

Created on 2009-06-08 19:01 by srid, last changed 2009-06-08 19:05 by benjamin.peterson. This issue is now closed.

Messages (2)
msg89097 - (view) Author: Sridhar Ratnakumar (srid) Date: 2009-06-08 19:01
It'd be nice to get the encoding used by a specific Python file.
Considering that 'print' uses sys.stdout.encoding which is always set to
None when the Python process is run by subprocess, knowing the source
encoding is absolutely necessary in decoding the output generated by
that script.

eg: Run 'python --author' in the python-wifi-0.3.1 source
package as a subprocess.Popen(...) call.. and print the
string; you'll get encoding error.. unless you do'latin1') .. where latin1 is specified as a coding:
line in

The following function tries to detect the coding, but this guess work
not necessary when this is integrated with the standard library whose
implementation maps directly to that of PEP 263.

+def get_python_source_encoding(python_file):
+    """Detect the encoding used in the file ``python_file``
+    Detection is done as per
+    """
+    first_two_lines = open(python_file).readlines()[:2]
+    coding_line_regexp = ".*coding[:=]\s*([-\w.]+).*"
+    for line in first_two_lines:
+        m = re.match(coding_line_regexp, line)
+        if m:
+            return
+    # if no encoding is defined, use the default encoding
+    return 'ascii'

subprocess encoding mess:
msg89099 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-06-08 19:05
Already done. tokenize.detect_encoding()
Date User Action Args
2009-06-08 19:05:02benjamin.petersonsetstatus: open -> closed

nosy: + benjamin.peterson
messages: + msg89099

resolution: not a bug
2009-06-08 19:01:43sridcreate