This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: IDLE 3.0a5 cannot handle UTF-8
Type: compile error Stage:
Components: IDLE Versions: Python 3.0
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, benjamin.peterson, geon, loewis, orsenthil, sven.siegmund, terry.reedy
Priority: normal Keywords:

Created on 2008-05-11 22:49 by sven.siegmund, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
czech-it.py sven.siegmund, 2008-05-11 22:49 sample source code
q.py geon, 2009-01-02 21:04
Messages (12)
msg66689 - (view) Author: Sven Siegmund (sven.siegmund) Date: 2008-05-11 22:49
I have a source code which IDLE 3.0a5 cannot parse, but Python 3.0a5 
can (also attached): 

#!/usr/bin/python
# -*- coding: utf-8 -*-

def načtiSlovník(zdroj='slovník.txt'):
    soubor = open(zdroj, mode='r', encoding='utf_8')
    řádky = soubor.readlines()
    for řádek in řádky:
        print(řádek, end='')

načtiSlovník()
# End of source code


I have set up Default Source Encoding to UTF-8 in IDLE's general 
configuration. Still, when I open that source code and try to run it, 
IDLE complains about "invalid character in identifier" and highlights 
"zdroj" red in the first line (sic!). 

However, when I run the source code from command line (by "python 
<filename>"), it gets executed well and does what it shall do. 

I should probably add, that I have installed py3k:62932M, May 9 2008, 
16:23:11 [MSC v.1500 32 bit (Intel)] on win32. I use Windows XP SP 3. 
IDLE uses Tk version 8.4
msg70918 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-08 22:17
3.0b2, WinXP
I cut and pasted the text above into an empty IDLE edit window, hit F5,
and in the blink of an eye, both IDLE windows disappeared.  No error
message, no Window's error box, just gone.

The pasted text was saved to the file.  When I added input() statements,
and ran with CPython directly, it got to the function call and then
crashed.  Rerunning with Idle with input() at the top, it still crashed,
indicating that it crashed during compilation and never started execution.
msg70960 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2008-08-10 06:10
I was NOT able to Reproduce it in IDLE 3.0b2 running on Linux. Would you
like to try with 3.0b2 and also do.

tjreedy: I did not properly get your comment. When you open Idle
instance and create a new Document, cut-paste the code, and Run. The
Execution happens in the IDLE instance which was running. No need of
input() call.
msg70989 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2008-08-10 18:05
When one runs a file with Python30.exe, it opens a window, runs the
file, and closes the window too fast to see what happened.  The point of
the input() statements is to 'pause' execution.  This is standard
debugging along with print()/write() statements.  With three input()s, I
determined that CPython compiled the file, executed the def statement,
and failed the function call (due, I presume, to the requested disk file
not being present).  IDLE, on the other hand, crashed before getting to
the first input() before the function def.  So it crashed while
compiling the file -- or as a result of trying to execute input().

I just tried cut and paste into the IDLE shell window (without the
encoding cookie) and it runs as expected, giving
IOError: [Errno 2] No such file or directory: 'slovník.txt'
Retrying with the cookie gives the same.  I have no idea if it is
recognized in interactive mode or if interactive mode is utf8 by default.

I just tried running from a file without the coding line and IDLE
crashed again.  So the problem is reading from a file on Windows.

IDLE is doing *something* different than bare CPython.  Actually, it
uses pythonw30.exe rather that python.exe, but when I replace the input
statements with file write statements (input raises error with pythonw),
pythonw also executed through to the def statement. But I still suspect
something in the interaction between IDLE and pythonw.  There was a
another problem with IDLE and pythonw in .a5
http://bugs.python.org/issue2841 
which seems to have disappeared without being officially fixed.
msg71347 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2008-08-18 16:16
I managed to get a proper traceback:

Exception in Tkinter callback
Traceback (most recent call last):
  File "c:\afa\python\py3k\lib\tkinter\__init__.py", line 1405, in __call__
    return self.func(*args)
  File "c:\afa\python\py3k\lib\idlelib\MultiCall.py", line 165, in handler
    r = l[i](event)
  File "c:\afa\python\py3k\lib\idlelib\ScriptBinding.py", line 124, in
run_module_event
    code = self.checksyntax(filename)
  File "c:\afa\python\py3k\lib\idlelib\ScriptBinding.py", line 86, in
checksyntax
    source = f.read()
  File "C:\afa\python\py3k\lib\io.py", line 1692, in read
    decoder.decode(self.buffer.read(), final=True))
  File "C:\afa\python\py3k\lib\io.py", line 1267, in decode
    output = self.decoder.decode(input, final=final)
  File "C:\afa\python\py3k\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position
34: character maps to <undefined>

This is because the file is opened in text mode, using the default
encoding (cp1252). It should instead open it in binary mode, and look
for a -*-coding-*- directive.
There is an attempt for this in linecache.py, but the logic there is wrong.

Is there already a function to properly open a python source file?
tokenize.detect_encoding could be used, but is there a way to open a
file in binary & universal mode?
msg78645 - (view) Author: Pavel Kosina (geon) Date: 2008-12-31 19:31
the following very simple example might be the the same issue:

x="ěščřžýáíé"
print (x)

It reliably puts down IDLE entirely without any error message. It is
saved in UTF-8. 
python +idle 3.0, wxp sp3
msg78867 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-02 20:16
This is now fixed in r68022.
msg78877 - (view) Author: Pavel Kosina (geon) Date: 2009-01-02 21:04
Thank you. Not sure, what to do now, cause the putting down of IDLE is
fixed, but still within IDLE I get wrong output:

x="ěščřžýáíé"
print (x)

>>> 
ěščřžýáíé

when running this script under python command line form another editor,
I get the output readable as expected. Shall I open another issue?
msg78885 - (view) Author: Pavel Kosina (geon) Date: 2009-01-02 21:51
Moreover: do you think its good idea to change the file encoding at
opened and then saved file without any question when there is no
encoding declaration? :-( Users do not edit just python programs, they
can edit also config files, text files, etc ....

It could be that at first saving we are asked to use 
*utf8 
*used one.
msg78887 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2009-01-02 21:55
Well, if you're still using 3.0, it's not patched yet. :)
msg78888 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-02 21:57
> Thank you. Not sure, what to do now, cause the putting down of IDLE is
> fixed, but still within IDLE I get wrong output:

In general, when a reported issue is closed, and you have a further
issue, the right thing is to report this as a new issue, rather than
following up on closed one.

In the specific case, I believe you are referring to issue 4008, and
its duplicates #4410, and #4623. There might be more duplicate reports,
so no need to report yet another one.
msg78891 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-02 22:08
> Moreover: do you think its good idea to change the file encoding at
> opened and then saved file without any question when there is no
> encoding declaration?

See my previous comment: one issue at a time. This issue was about
code that IDLE 3.0a5 cannot parse, but IDLE 3.0 can. If you have
a further issue with the proposed solution, please report it separately.

> Users do not edit just python programs, they
> can edit also config files, text files, etc ....

I don't think IDLE was designed to support editing any other files.
I would be opposed to adding user interface relevant only for editing
non-Python text files. If IDLE would assume a different encoding at save
time than was assumed at load time, this might be still considered
a bug; you would need to provide detailed instructions to reproduce
such a bug.
History
Date User Action Args
2022-04-11 14:56:34adminsetgithub: 47076
2009-01-02 22:08:12loewissetmessages: + msg78891
2009-01-02 21:57:43loewissetmessages: + msg78888
2009-01-02 21:55:33benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg78887
2009-01-02 21:51:16geonsetmessages: + msg78885
2009-01-02 21:04:18geonsetfiles: + q.py
messages: + msg78877
2009-01-02 20:16:42loewissetstatus: open -> closed
nosy: + loewis
resolution: fixed
messages: + msg78867
2008-12-31 19:31:43geonsetnosy: + geon
messages: + msg78645
2008-08-18 16:16:01amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg71347
2008-08-10 18:05:56terry.reedysetmessages: + msg70989
2008-08-10 06:10:24orsenthilsetnosy: + orsenthil
messages: + msg70960
2008-08-08 22:17:03terry.reedysetnosy: + terry.reedy
messages: + msg70918
2008-05-11 22:49:47sven.siegmundcreate