This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Built-in compile function with PEP 0263 encoding bug
Type: Stage:
Components: Interpreter Core Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: nnorwitz Nosy List: cito, georg.brandl, loewis, nnorwitz, wigy
Priority: high Keywords:

Created on 2005-02-03 13:11 by cito, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
ast-2.5.diff nnorwitz, 2006-03-20 08:28
compile-2.4.diff nnorwitz, 2006-03-20 08:29
c.diff loewis, 2006-03-20 09:03
Messages (11)
msg24138 - (view) Author: Christoph Zwerschke (cito) * Date: 2005-02-03 13:11
a = 'print "Hello, World"'
u = '# -*- coding: utf-8 -*-\n' + a

print compile(a, '<string>', 'exec') # ok
print compile(u, '<string>', 'exec') # ok
print compile(unicode(a), '<string>', 'exec') # ok
print compile(unicode(u), '<string>', 'exec') # error

# The last line gives a SystemError.
# Think this is a bug.
msg24139 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-02-10 00:37
Logged In: YES 
user_id=21627

There is a bug somewhere, certainly. However, I believe it
is in PEP 263, which should point out that unicode strings
in compile are only legal if they do *not* contain an
encoding declaration, as such strings are implicitly encoded
as UTF-8.
msg24140 - (view) Author: Vágvölgyi Attila (wigy) Date: 2005-09-28 04:20
Logged In: YES 
user_id=156682

If this special case is a feature, not a bug, than it breaks
some symmetry for sure.

If I run a script having utf-8 encoding from a file with

  python script.py

then it has to have an encoding declaration. Now if I would
like to load the same file manually, decode it to a unicode
object, I also have to remove the encoding declaration at
the beginning of the file before I can give it to the
compile() function.

What special advantage comes from the fact that the compiler
does not simply ignore encoding declaration nodes from
unicode objects? Does this error message catch some possible
errors or does it make the compiler code simpler?
msg24141 - (view) Author: Vágvölgyi Attila (wigy) Date: 2005-09-28 04:29
Logged In: YES 
user_id=156682

If this special case is a feature, not a bug, than it breaks
some symmetry for sure.

If I run a script having utf-8 encoding from a file with

  python script.py

then it has to have an encoding declaration. Now if I would
like to load the same file manually, decode it to a unicode
object, I also have to remove the encoding declaration at
the beginning of the file before I can give it to the
compile() function.

What special advantage comes from the fact that the compiler
does not simply ignore encoding declaration nodes from
unicode objects? Does this error message catch some possible
errors or does it make the compiler code simpler?
msg24142 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2005-09-28 05:48
Logged In: YES 
user_id=21627

If you load the files manually, why is it that you want to
decode them to Unicode before compile()ing them? Couldn't
you just pass the bytes you read from the file to compile()?
msg24143 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-02-20 21:37
Logged In: YES 
user_id=849994

This even aborts the interpreter in 2.5 HEAD with a failing
assertion.
msg24144 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-03-20 08:28
Logged In: YES 
user_id=33168

Martin, the attached patches (2.4 and 2.5) fix the problem.
 However, it seems that the patches would violate the PEP
according to one of your notes.  I'm not sure about all the
details, but ISTM based on your comment that if (flags &&
flags->cf_flags & PyCF_SOURCE_IS_UTF8) and (TYPE(n) ==
encoding_decl) this is an error that should be returned?

I would like to get this fixed for 2.4.3, so we need to move
fast for it.  2.5 can wait and is trivial to fix once we
know what this is supposed to do.
msg24145 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-03-20 09:03
Logged In: YES 
user_id=21627

I still wonder why anybody would want to do that, so I don't
see it as a big problem that it gives an error in 2.4: it
*should* give an error, although not the one it currently gives.

It seems that wigy would expect that the encoding
declaration is ignored, whereas you (nnorwitz) are
suggesting that the UTF-8 default should be ignored. In the
face of ambiguity, refuse the temptation to guess.

So I still think it should give a SyntaxError instead. I'll
attach an alternative patch.
msg24146 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-03-20 21:14
Logged In: YES 
user_id=33168

Actually, I don't much care about the answer as long as it
isn't a core dump/assert or a SystemError.  I'm fine with a
syntax error.
msg24147 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-03-22 13:56
Logged In: YES 
user_id=21627

I've committed this patch (along with a test case) as 43227
into the 2.4 branch; the trunk still needs fixing.
msg24148 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-03-23 05:40
Logged In: YES 
user_id=33168

Updated PEP.

Committed revision 43243.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41520
2005-02-03 13:11:42citocreate