New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compile() doesn't ignore the source encoding when a string is passed in #48876
Comments
When compile() is called with a string it is a reasonable assumption >>> source = "# coding=latin-1\n\u00c6 = '\u00c6'"
>>> compile(source, '<test>', 'exec')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<test>", line 2
� = '�'
^
SyntaxError: invalid character in identifier
>>> compile(source.encode('latin-1'), '<test>', 'exec')
<code object <module> at 0x389cc8, file "<test>", line 2> |
bpo-4742 is similar issue: >>> source = b"# coding=cp1252\n\x94 = '\x94'".decode('cp1252')
>>> compile(source, '<test>', 'exec')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<test>", line 0
SyntaxError: unknown encoding: cp1252 The real error here is masked; just before the exception is set, there It seems that the source internal representation is correct utf-8, but |
Here is what I have found out so far. I just tried setting a PyCF flag to denote that the char* data is I'm going to try to explicitly convert to UTF-8 and see if that works. |
So explicitly converting to UTF-8 didn't work, or at least as simply as |
The function decode_str() (Parser/tokenizer.c) is responsible to The patch introduces a new compiler flag (PyCF_IGNORE_COOKIE) and a With my patch, the first Brett's example displays:
$ ./python com2.py
Traceback (most recent call last):
File "com2.py", line 3, in <module>
compile(source, '<test>', 'exec')
File "<test>", line 2
” = '”'
^
SyntaxError: invalid character in identifier The error cursor is not at the right column (bug related to the issue The patch changes the public API: PyTokenizer_FromString() prototype There are some old PyPARSE_xxx constants in Include/parsetok.h that |
Oops, I attached the wrong file :-p |
I tried py3k_adjust_cursor_at_syntax_error_v2.patch (issue bpo-2382) and the $ ./python com2.py
Traceback (most recent call last):
...
File "<test>", line 2
” = '”'
^ So it's not a new bug ;-) |
New version of my patch: add a regression test. @brett.cannon: Could you review my patch? |
I will see when I can get to it (other stuff is taking priority). Not |
Ping! Can anyone review my patch? |
I don't like the change of API to PyTokenizer_FromString. I would prefer The (char *) cast in PyTokenizer_FromString is unneeded. You need to indent the "else" clause after you test for ignore_cookie. I'd like to see a test that shows that byte strings still have their |
Note that PyTokenizer_FromString is not an API function: it's not marked |
Ok, I created a new function PyTokenizer_FromUnicode(). I
The cast on the decode_str() result? It was already present in the
Ooops, I always have problems to generate a diff because my editor
test_pep263 has already two tests using a "#coding:" header. |
On Thu, Jan 29, 2009 at 5:13 PM, STINNER Victor <report@bugs.python.org> wrote:
How about PyTokenizer_FromUTF8() then?
No, I was referring to this line: tok->encoding = (char *)PyMem_MALLOC |
Here's another patch. The one problem is that it causes test_coding to |
@benjamin.peterson: I don't see you changes... I read both patches
I don't understand the change in source_as_string(). Except of that,
The test have to fail, but the error is not the the compile() patch, Index: Lib/test/test_coding.py path = os.path.dirname(__file__)
filename = os.path.join(path, module_name + '.py')
- fp = open(filename, encoding='utf-8')
+ fp = open(filename, 'rb')
text = fp.read()
fp.close()
self.assertRaises(SyntaxError, compile, text,
filename, 'exec') |
On Tue, Feb 17, 2009 at 6:22 PM, STINNER Victor <report@bugs.python.org> wrote:
Py_CFFLAGS_SOURCE_IS_UTF8 is already set in compile().
That fix is correct, but I think it avoids what the test is meant to |
Yeah! Anyone to review and/or commit the last patch? |
I'll deal with it eventually. |
Fixed in r70112. |
Should this be backported? |
It's the r70113 (not the 70112). I see that pitrou backported the fix |
I'm glad to have discovered this topic. I bumped into something similar from code import InteractiveInterpreter
ii = InteractiveInterpreter()
source = ...
ii.runsource(source) What should be the encoding and/or the type (str, bytes) of the "source" IDLE is not suffering from this. Its interactive interpreter is somehow I'm a little bit confused here (win2k, winXP sp2, Python 3.0.1). |
On Tue, Mar 24, 2009 at 09:24, Jean-Michel Fauth <report@bugs.python.org>wrote:
Off the top of my head it should be UTF-8. Otherwise it can probably be |
When I was preparing some test examples to be submitted here. IDLE, Python 3.0.1, winxp sp2 >>> source = b'print(999)'
>>> compile(source, '<in>', 'exec')
<code object <module> at 0x00AA5CC8, file "<in>", line 1>
>>> r = compile(source, '<in>', 'exec')
>>> exec(r)
999
>>> from code import InteractiveInterpreter
>>> ii = InteractiveInterpreter()
>>> ii.runsource(source)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
ii.runsource(source)
File "C:\Python30\lib\code.py", line 63, in runsource
code = self.compile(source, filename, symbol)
File "C:\Python30\lib\codeop.py", line 168, in __call__
return _maybe_compile(self.compiler, source, filename, symbol)
File "C:\Python30\lib\codeop.py", line 70, in _maybe_compile
for line in source.split("\n"):
TypeError: Type str doesn't support the buffer API
>>>
>>> source = 'print(999)'
>>> ii.runsource(source)
999
False |
@jmfauth: Can you open a different issue for the IDLE issue? |
Yes, I could, but I think this is not an IDLE issue, I'm just Code in the editor: # -- coding: cp1252 -- from code import InteractiveInterpreter
ii = InteractiveInterpreter()
source = b'print(999)'
ii.runsource(source) Output: >c:\python30\pythonw -u "uuu.py"
Traceback (most recent call last):
File "uuu.py", line 8, in <module>
ii.runsource(source)
File "c:\python30\lib\code.py", line 63, in runsource
code = self.compile(source, filename, symbol)
File "c:\python30\lib\codeop.py", line 168, in __call__
return _maybe_compile(self.compiler, source, filename, symbol)
File "c:\python30\lib\codeop.py", line 70, in _maybe_compile
for line in source.split("\n"):
TypeError: Type str doesn't support the buffer API
>Exit code: 1 My interactive interpreter
from code import InteractiveInterpreter
>>> ---
ii = InteractiveInterpreter()
>>> ---
source = b'print(999)'
>>> ---
ii.runsource(source)
Traceback (most recent call last):
File "<smid last command>", line 1, in <module>
File "c:\Python30\lib\code.py", line 63, in runsource
code = self.compile(source, filename, symbol)
File "c:\Python30\lib\codeop.py", line 168, in __call__
return _maybe_compile(self.compiler, source, filename, symbol)
File "c:\Python30\lib\codeop.py", line 70, in _maybe_compile
for line in source.split("\n"):
TypeError: Type str doesn't support the buffer API
>>> ---
======================= I realised and missed the fact the str() function is now accepting Code in the editor: # -- coding: cp1252 -- from code import InteractiveInterpreter
ii = InteractiveInterpreter()
source = b'print(999)'
source = str(source, 'cp1252') #<<<<<<<<<<
ii.runsource(source) Output: (ok)
======================= In a few words, my empirical understanding of the story.
compile(source, filename, encodings='utf-8', ...)
(Problem: BOM, coding cookies?). I suspect the miscellaneous discussions one finds from people attempting Regards. |
compile() works as expected. Your problem is related to The error comes from bytes.split(str): _maybe_compile() should use
Yes, runsource() (only) works with the str type.
Please see issues: |
Quick feedback from a Windows user. I made a few more tests with the freshly installed Pyton 3.1a1. The As a side effect, it now possible to write an "execfile()" without (Of course, taking in account and managing universal newline). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: