Message406427
Py_CompileString() in Python 3.9 and later, using the PEG parser, appears to no longer honours source encoding cookies. A reduced test case:
#include "Python.h"
#include <stdio.h>
const char *src = (
"# -*- coding: Latin-1 -*-\n"
"'''\xc3'''\n");
int main(int argc, char **argv)
{
Py_Initialize();
PyObject *res = Py_CompileString(src, "some_path", Py_file_input);
if (res) {
fprintf(stderr, "Compile succeeded.\n");
return 0;
} else {
fprintf(stderr, "Compile failed.\n");
PyErr_Print();
return 1;
}
}
Compiling and running the resulting binary with Python 3.8 (or earlier):
% ./encoding_bug
Compile succeeded.
With 3.9 and PYTHONOLDPARSER=1:
% PYTHONOLDPARSER=1 ./encoding_bug
Compile succeeded.
With 3.9 (without the env var) or 3.10:
% ./encoding_bug
Compile failed.
File "some_path", line 2
'''�'''
^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data
Writing the same bytes to a file and making python3.9 or python3.10 import them works fine, as does passing the bytes to compile():
Python 3.10.0+ (heads/3.10-dirty:7bac598819, Nov 16 2021, 20:35:12) [GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b = open('encoding_bug.py', 'rb').read()
>>> b
b"# -*- coding: Latin-1 -*-\n'''\xc3'''\n"
>>> import encoding_bug
>>> encoding_bug.__doc__
'Ã'
>>> co = compile(b, 'some_path', 'exec')
>>> co
<code object <module> at 0x7f447e1b0c90, file "some_path", line 1>
>>> co.co_consts[0]
'Ã'
It's just Py_CompileString() that fails. I don't understand why, and I do believe it's a regression. |
|
Date |
User |
Action |
Args |
2021-11-16 19:45:26 | twouters | set | recipients:
+ twouters, gregory.p.smith, lys.nikolaou, pablogsal |
2021-11-16 19:45:26 | twouters | set | messageid: <1637091926.91.0.847266821878.issue45822@roundup.psfhosted.org> |
2021-11-16 19:45:26 | twouters | link | issue45822 messages |
2021-11-16 19:45:26 | twouters | create | |
|