Message118604
It looks like the parser API (eg. PyParser_ParseFileFlagsEx, PyParser_ASTFromFile) expects utf-8 filename: err_input() decodes the filename from utf-8. But
Example in a non-ascii directory (/home/SHARE/SVN/py3kéŁ) and an ascii locale:
----
$ LANG= ./python -c "import inspect"
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/SHARE/SVN/py3k\xe9\u0141/Lib/inspect.py", line 1
SyntaxError: encoding problem: with BOM
----
The problem occurs in fp_setreadl(): this function reopens the file with the right encoding. But to open the file, the bytes filename is decoded from utf-8 (in strict mode), whereas the filename (in my example) contains surrogates and utf-8 in strict mode rejects surrogates.
To support undecodable filenames in the parser API, we have two solutions:
* Use the filesystem encoding with surrogateescape (PyUnicode_EncodeFSDefault, PyUnicode_DecodeFSDefault)
* Use utf-8 in another mode: surrogateescape or surrogatepass
The parser API has many public functions, and we have to consider the compatibility with Python 3.1.
See also #9713 and #8611. |
|
Date |
User |
Action |
Args |
2010-10-13 23:52:58 | vstinner | set | recipients:
+ vstinner |
2010-10-13 23:52:58 | vstinner | set | messageid: <1287013978.13.0.815823866511.issue10095@psf.upfronthosting.co.za> |
2010-10-13 23:52:56 | vstinner | link | issue10095 messages |
2010-10-13 23:52:56 | vstinner | create | |
|