classification
Title: SyntaxError executing a script containing non-ASCII characters in its name or path
Type: compile error Stage: patch review
Components: Interpreter Core, Windows Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, ggenellina, vstinner
Priority: normal Keywords: patch

Created on 2008-12-26 00:41 by ggenellina, last changed 2009-01-01 23:07 by amaury.forgeotdarc. This issue is now closed.

Files
File name Uploaded Description Edit
unicode_scriptname.patch amaury.forgeotdarc, 2008-12-31 15:32
Messages (6)
msg78286 - (view) Author: Gabriel Genellina (ggenellina) Date: 2008-12-26 00:41
Attempting to directly execute a script containing non-ASCII 
characters in its name or path raises SyntaxError.

The script contents are mostly irrelevant, except it must contain an 
encoding declaration (with *any* encoding, real or inexistent).

Running "python foo.py" works, but invoking it directly as "foo.py" 
raises `SyntaxError: None`, or sometimes `SyntaxError: encoding 
problem: with BOM` (no BOM is present in the source file, a plain 
ASCII text file).

C:\TEMP>cd áéíóú

C:\TEMP\áéíóú>type test.py
# -*- coding: ascii -*-

C:\TEMP\áéíóú>C:\Apps\Python30\python.exe test.py

C:\TEMP\áéíóú>test.py
SyntaxError: None

To avoid any doubt, the file has no strange characters:

C:\TEMP\áéíóú>python -c "print(repr(open('test.py','rb').read()))"
'# -*- coding: ascii -*-\r\n'

and .py files are associated with the same interpreter:

C:\TEMP\áéíóú>assoc .py
.py=Python.File

C:\TEMP\áéíóú>ftype Python.File
Python.File="C:\Apps\Python30\python.exe" "%1" %*

The same thing happens if the file name contains any non-ASCII 
character (the path may be pure ASCII).
msg78614 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-12-31 15:32
This also happens if there is any kind of syntax error in the file: 
"SyntaxError: None" is printed without any other hint.

The (char*) filename passed to PyRun_AnyFile should be utf-8 encoded;
Otherwise the file cannot be re-opened.

Attached patch fixes both issues, please review.
It removes one occurrence of wcstombs in favor of the PyUnicode machinery.
msg78617 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2008-12-31 16:03
I'm unable to reproduce the problem on Linux. I wrote a 
script /home/haypo/ééé/ééé.py:
---------------
#!/home/haypo/prog/SVN/py3k/python
# -*- coding: ascii -*-
print("a")
---------------

The script runs fine:
$ ./ééé.py
a
$ /home/haypo/prog/SVN/py3k/python ééé.py
a

Is the problem specific to Windows?
msg78621 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-12-31 16:21
Yes. As usual, the problem occurs when the platform encoding (used by
wcstombs) is not utf-8.
msg78650 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-12-31 20:11
Looks good.
msg78737 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-01-01 23:07
Fixed in r68143 (py3k) and r68144 (3.0).
Thanks for the report!
History
Date User Action Args
2009-01-01 23:07:48amaury.forgeotdarcsetstatus: open -> closed
resolution: fixed
messages: + msg78737
2008-12-31 20:11:49benjamin.petersonsetkeywords: - needs review
nosy: + benjamin.peterson
messages: + msg78650
2008-12-31 16:21:45amaury.forgeotdarcsetmessages: + msg78621
2008-12-31 16:03:39vstinnersetnosy: + vstinner
messages: + msg78617
2008-12-31 15:32:21amaury.forgeotdarcsetfiles: + unicode_scriptname.patch
keywords: + needs review, patch
messages: + msg78614
nosy: + amaury.forgeotdarc
stage: patch review
2008-12-26 00:41:46ggenellinacreate