This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients eric.araujo, nils, tarek, vstinner, zegreek
Date 2010-12-25.21:46:46
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1293313640.21272.173.camel@marge>
In-reply-to <4D14B22F.8050301@free.fr>
Content
Le vendredi 24 décembre 2010 à 14:46 +0000, Baptiste Carvello a écrit :
> the patch solves the bug for me as well (using locale "C", the 
> filesystem encoding is utf-8). However, I do not understand why the 
> patch checks that the shebang line decodes with both utf-8 and the 
> file's encoding. The shebang line is only used by the kernel to locate 
> the interpreter, so none of these should matter. Or have I misuderstood 
> the patch?

The shebang is read by 3 different functions:

 a) the shell reads the first line: if it starts with "#!", it's a
shebang: read the command and options and execute it
 b) Python searchs a "#cookie:xxx" pattern in the first or the second
line using a binary parser
 c) Python reads the file using the Python encoding: encoding written in
the #coding:xxx header or UTF-8 by default

(a) The shell reads the file as a binary file, it doesn't care of the
encoding. It reads byte strings and pass them to the kernel.

(b) The parser starts with the default encoding, UTF-8. Even if the file
encoding is not UTF-8, all lines (Python only checks the cookie in the
first or the second line) before #coding:xxx cookie are read in UTF-8.
The shebang have to be written to the first line, so the cookie cannot
be written before the shebang => the shebang have to be decodable from
UTF-8

(b) If the file encoding is not UTF-8, a #cookie:xxx is used and the
whole file (including the shebang) have to be decodable from this
encoding => the shebang have to be decodable from the file encoding

So the shebang have to be decodable from UTF-8 and from the file
encoding.

I should maybe add a comment about that in the patch.

Example of (b) issue:
---
$ ./build/scripts-3.2/2to3
  File "./build/scripts-3.2/2to3", line 1
SyntaxError: Non-UTF-8 code starting with '\xff' in
file ./build/scripts-3.2/2to3 on line 1, but no encoding declared; see
http://python.org/dev/peps/pep-0263/ for details
---
The shebang is b'#!/home/haypo/tmp/py3k\xff/bin/python3.2\n', my locale
encoding is UTF-8 and the file encoding has no encoding cookie (it is
encoded to UTF-8).

--

copy_script.patch fixes an issue if the configure prefix is not ASCII
(especially if the prefix is not decodable from UTF-8).
History
Date User Action Args
2010-12-25 21:46:49vstinnersetrecipients: + vstinner, tarek, eric.araujo, zegreek, nils
2010-12-25 21:46:46vstinnerlinkissue6011 messages
2010-12-25 21:46:46vstinnercreate