Message124646
Le vendredi 24 décembre 2010 à 14:46 +0000, Baptiste Carvello a écrit :
> the patch solves the bug for me as well (using locale "C", the
> filesystem encoding is utf-8). However, I do not understand why the
> patch checks that the shebang line decodes with both utf-8 and the
> file's encoding. The shebang line is only used by the kernel to locate
> the interpreter, so none of these should matter. Or have I misuderstood
> the patch?
The shebang is read by 3 different functions:
a) the shell reads the first line: if it starts with "#!", it's a
shebang: read the command and options and execute it
b) Python searchs a "#cookie:xxx" pattern in the first or the second
line using a binary parser
c) Python reads the file using the Python encoding: encoding written in
the #coding:xxx header or UTF-8 by default
(a) The shell reads the file as a binary file, it doesn't care of the
encoding. It reads byte strings and pass them to the kernel.
(b) The parser starts with the default encoding, UTF-8. Even if the file
encoding is not UTF-8, all lines (Python only checks the cookie in the
first or the second line) before #coding:xxx cookie are read in UTF-8.
The shebang have to be written to the first line, so the cookie cannot
be written before the shebang => the shebang have to be decodable from
UTF-8
(b) If the file encoding is not UTF-8, a #cookie:xxx is used and the
whole file (including the shebang) have to be decodable from this
encoding => the shebang have to be decodable from the file encoding
So the shebang have to be decodable from UTF-8 and from the file
encoding.
I should maybe add a comment about that in the patch.
Example of (b) issue:
---
$ ./build/scripts-3.2/2to3
File "./build/scripts-3.2/2to3", line 1
SyntaxError: Non-UTF-8 code starting with '\xff' in
file ./build/scripts-3.2/2to3 on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
---
The shebang is b'#!/home/haypo/tmp/py3k\xff/bin/python3.2\n', my locale
encoding is UTF-8 and the file encoding has no encoding cookie (it is
encoded to UTF-8).
--
copy_script.patch fixes an issue if the configure prefix is not ASCII
(especially if the prefix is not decodable from UTF-8). |
|
Date |
User |
Action |
Args |
2010-12-25 21:46:49 | vstinner | set | recipients:
+ vstinner, tarek, eric.araujo, zegreek, nils |
2010-12-25 21:46:46 | vstinner | link | issue6011 messages |
2010-12-25 21:46:46 | vstinner | create | |
|