This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author srid
Recipients lemburg, loewis, srid
Date 2009-06-08.19:01:42
SpamBayes Score 1.5521251e-11
Marked as misclassified No
Message-id <1244487704.55.0.771864398626.issue6240@psf.upfronthosting.co.za>
In-reply-to
Content
It'd be nice to get the encoding used by a specific Python file.
Considering that 'print' uses sys.stdout.encoding which is always set to
None when the Python process is run by subprocess, knowing the source
encoding is absolutely necessary in decoding the output generated by
that script.

eg: Run 'python setup.py --author' in the python-wifi-0.3.1 source
package as a subprocess.Popen(...) call.. and print the stdout.read()
string; you'll get encoding error.. unless you do
stdout.read().decode('latin1') .. where latin1 is specified as a coding:
line in setup.py.

The following function tries to detect the coding, but this guess work
not necessary when this is integrated with the standard library whose
implementation maps directly to that of PEP 263.

+def get_python_source_encoding(python_file):
+    """Detect the encoding used in the file ``python_file``
 
+    Detection is done as per http://www.python.org/dev/peps/pep-0263/
+    """
+    first_two_lines = open(python_file).readlines()[:2]
+    coding_line_regexp = ".*coding[:=]\s*([-\w.]+).*"
+
+    for line in first_two_lines:
+        m = re.match(coding_line_regexp, line)
+        if m:
+            return m.group(1)
+
+    # if no encoding is defined, use the default encoding
+    return 'ascii'

ref:
subprocess encoding mess: http://bugs.python.org/issue6135
History
Date User Action Args
2009-06-08 19:01:45sridsetrecipients: + srid, lemburg, loewis
2009-06-08 19:01:44sridsetmessageid: <1244487704.55.0.771864398626.issue6240@psf.upfronthosting.co.za>
2009-06-08 19:01:43sridlinkissue6240 messages
2009-06-08 19:01:42sridcreate