This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: import and execfile don't handle utf-16
Type: Stage:
Components: Unicode Versions:
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: loewis, terrelshumway
Priority: normal Keywords:

Created on 2002-03-17 05:46 by terrelshumway, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Messages (2)
msg9743 - (view) Author: Terrel Shumway (terrelshumway) Date: 2002-03-17 05:46
import and execfile don't handle utf-16 encoded files,
but if I read the file with an appropriate encoder, 
exec works fine on the loaded uncode string.

Also, changing site.encoding to utf-16 has a 
detrimental effect. (I need to understand this better.)

I understand that the general problem is difficult to 
solve, but it seems it would be fairly easy to handle 
for the specific case of utf-16 file with some byte 
order mark at the begining: if import/execfile fail 
and the file starts with some BOM, re-read the file 
with an appropriate codec.


Use this code to reproduce the problem
--------------
import sys
print sys.getdefaultencoding()

code = u'print "this is a test: OK"'

import traceback
import codecs

codecs.open("foo.py","w+","utf-16").write(code)

try:
    execfile("foo.py")
except:
    traceback.print_exc()

try:
    import foo
except:
    traceback.print_exc()


uu = codecs.open("foo.py","r","utf-16").read()

exec(uu)
--------------
produces this output
--------------
ascii
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 12, in ?
    execfile("foo.py")
  File "<string>", line 1
       p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 17, in ?
    import foo
  File "<string>", line 1
       p
     ^
 SyntaxError: invalid syntax
this is a test: OK

--------------
If I edit site.py to change encoding to "utf-16", I get
--------------
utf-16
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 15, in ?
    execfile("foo.py")
  File "<string>", line 1
       p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 20, in ?
    import foo
  File "<string>", line 1
       p
     ^
 SyntaxError: invalid syntax
Traceback (most recent call last):
  File "C:\opt\unicode-exec.py", line 27, in ?
    exec(uu)
TypeError: expected string without null bytes
----
msg9744 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2002-03-17 17:33
Logged In: YES 
user_id=21627

This is not a bug. The language reference clearly says, in

http://www.python.org/doc/current/ref/lexical.html

"Python uses the 7-bit ASCII character set for program text
and string literals."

PEP 263 (if accepted) will extend this to other encodings.
However, UTF-16 is not in the list of encodings supported
under this PEP, as it is not an ASCII superset.
History
Date User Action Args
2022-04-10 16:05:06adminsetgithub: 36269
2002-03-17 05:46:56terrelshumwaycreate