This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: idle 3.1a1 utf8
Type: Stage:
Components: IDLE Versions: Python 3.1
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, geon, loewis
Priority: release blocker Keywords: needs review, patch

Created on 2009-01-03 04:56 by geon, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
cp1250.py geon, 2009-01-03 04:56
conv.diff loewis, 2009-01-03 12:06 Convert from source encoding on opening (v2)
idleunicode1.jpg geon, 2009-01-03 21:09
conv.diff loewis, 2009-01-03 22:07 Convert from source encoding on opening (v3)
hello.py geon, 2009-01-04 13:17
Messages (24)
msg78932 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 04:56
When you open file without encoding declaration, make changes and save,
then IDLE changes without any question encodings to utf8. You can try it
on attached file that is cp1250 now. 

It could be that at first saving we are asked to use 
*utf8 
*current one.
msg78936 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 07:45
IDLE is right to save the file as UTF-8; the file is invalid Python 3.0
code. In Python 3.0, the source encoding *is* UTF-8; nothing else is
allowed unless you have an encoding declaration.

Perhaps IDLE should offer to convert it on opening.
msg78937 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 08:29
You can open script made in python 2.x and it stops immediately working
after saving, if it is coding-aware. You can have bigger project and use
idle for editing config and text files from this project too. It is
"unfair" to change without notification the encodings. Or do you
consider IDLE just for beginners for learning?
msg78938 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 08:36
I forgot about "Perhaps IDLE should offer to convert it on opening."
That would be nice, too.
msg78939 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 08:39
Here is a patch to provide an explicit message that the file will be
converted when the file is opened (also querying what encoding should be
converted from), answering the complaint that the conversion is without
notice.

If you want to edit Python 2.x scripts with IDLE 3, you need to add
encoding declarations to the files. For bigger projects, it is
reasonable to expect that they do add these declarations.

Again, please only report one issue at the time. You chose to make
Python source files the topic of this issue, so please don't divert into
config or text files.
msg78940 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 08:50
Sorry, where is the patch?
msg78941 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 10:36
OK, I got it. 

In my opinion it would nice if user can either convert file to utf8 or
to do nothing and add new encodings declaration or cancel. Current
"Cancel" gives an Decoding error. If you give an encodings that doesn't
exist, it shouldn't  destroy IDLE. Hoping its not my mistake, cause I do
not have all files from 3.1a  - just those from idlelib.
msg78949 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 12:06
> In my opinion it would nice if user can either convert file to utf8 or
> to do nothing and add new encodings declaration or cancel.

Ypu can still add an encoding declaration after the file got converted.
Cancelling is also possible.

> If you give an encodings that doesn't exist, it shouldn't  destroy IDLE.

Right. Here is a patch that fixes that.
msg78966 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 15:08
seems to be working.

Seems to me now I get it. The file encoding is ruled by the encoding
declaration. When I stated 

# -*- coding: cp1250 -*-

then the file would be saved in cp1250. 

Now hoping that I would keep this issue, cause it comes with this
patches: when I open file *with* say # -*- coding: cp1250 -*-, I am
asked to change to utf8. This behaviour was not before and is probably
unwanted.....
msg78967 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 15:22
> Now hoping that I would keep this issue, cause it comes with this
> patches: when I open file *with* say # -*- coding: cp1250 -*-, I am
> asked to change to utf8. This behaviour was not before and is probably
> unwanted.....

Actually, the behavior was there before - it's just that conversion was
silent. If you put an empty line after the coding declaration, it should
work well. issue 4008 contains a patch for that problem.
msg78970 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 15:37
Well, thanks a lot. 

(aware this is really off this issue): Now I even get the system of
patches  - issue 4008 solved the inconvenience in print Unicode signs
inside IDLE. Still not sure how works patches for Python versions. I
vote for including this a that patch about IDLE even in some 3.0.1, not
only in branch 3.1. In non-English speaking countries troubles with
encodings are more common, more beginners-frustrating than you can believe.

HNY 2009
Pavel Kosina
msg78972 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 15:53
> I
> vote for including this a that patch about IDLE even in some 3.0.1, not
> only in branch 3.1.

This is my plan, yes - hence I marked them all release-critical. They
still need review. I agree that IDLE in 3.0 is fairly broken wrt.
non-ASCII characters. I knew that before the release, but there is so
much work and only so little time. FWIW, my own language (German) also
uses (some) non-ASCII characters.
msg79004 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 21:09
I might have another problem with this patch and maybe also that one in
issue 4008. Having a file with

print ("ěščřžýáíé")
# saved in cp1250

Open - confirm converting to utf8 - F5 - error: see attached file
idleunicode1.jpg
msg79006 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 21:24
> print ("ěščřžýáíé")
> # saved in cp1250

I can't reproduce the problem. Can you please attach the
exact file that failed to work?
msg79009 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 21:33
Martin v. Löwis napsal(a), dne 3.1.2009 22:24:
> I can't reproduce the problem. Can you please attach the
> exact file that failed to work?
>   

You can use that one that is already here: cp1250.py. It is the same 
error with me.
msg79011 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 21:40
> You can use that one that is already here: cp1250.py. It is the same 
> error with me.

Ok, then what are the exact steps to reproduce? What code base, what
patches applied, what user interaction in what order?
msg79012 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 21:49
Microsoft Windows XP [Verze 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\prg\Python30\Lib\idlelib>svn update    # from 
http://svn.python.org/projects/python/branches/py3k/Lib/idlelib
Restored 'AutoCompleteWindow.py'
Restored 'ToolTip.py'
Restored 'UndoDelegator.py'
Restored 'Bindings.py'
Restored '__init__.py'
Restored 'AutoComplete.py'
Restored 'configHandler.py'
Restored 'HyperParser.py'
Restored 'ColorDelegator.py'
Restored 'Delegator.py'
Restored 'ObjectBrowser.py'
Restored 'testcode.py'
Restored 'configSectionNameDialog.py'
Restored 'ZoomHeight.py'
Restored 'PyShell.py'
Restored 'ParenMatch.py'
Restored 'config-keys.def'
Restored 'Debugger.py'
Restored 'CREDITS.txt'
Restored 'configDialog.py'
Restored 'StackViewer.py'
Restored 'HISTORY.txt'
Restored 'SearchEngine.py'
Restored 'ReplaceDialog.py'
Restored 'ScriptBinding.py'
Restored 'ChangeLog'
Restored 'tabbedpages.py'
Restored 'keybindingDialog.py'
Restored 'configHelpSourceEdit.py'
Restored 'WidgetRedirector.py'
Restored 'GrepDialog.py'
Restored 'FormatParagraph.py'
Restored 'EditorWindow.py'
Restored 'help.txt'
Restored 'config-highlight.def'
Restored 'PyParse.py'
Restored 'README.txt'
Restored 'rpc.py'
Restored 'OutputWindow.py'
Restored 'aboutDialog.py'
Restored 'idle.bat'
Restored 'TODO.txt'
Restored 'config-main.def'
Restored 'IdleHistory.py'
Restored 'PathBrowser.py'
Restored 'IOBinding.py'
Restored 'WindowList.py'
Restored 'ScrolledList.py'
Restored 'ClassBrowser.py'
Restored 'FileList.py'
Restored 'CallTips.py'
Restored 'idle.py'
Restored 'CodeContext.py'
Restored 'textView.py'
Restored 'SearchDialogBase.py'
Restored 'CallTipWindow.py'
Restored 'SearchDialog.py'
Restored 'RemoteObjectBrowser.py'
Restored 'idlever.py'
Restored 'RemoteDebugger.py'
Restored 'TreeWidget.py'
Restored 'NEWS.txt'
Restored 'idle.pyw'
Restored 'run.py'
Restored 'config-extensions.def'
Restored 'AutoExpand.py'
Restored 'Percolator.py'
Restored 'dynOptionMenuWidget.py'
Restored 'extend.txt'
Restored 'MultiStatusBar.py'
Restored 'MultiCall.py'
Restored 'macosxSupport.py'
At revision 68230.

C:\prg\Python30\Lib\idlelib>patch < conv.diff
patching file IOBinding.py

C:\prg\Python30\Lib\idlelib>patch < idle_encoding_4.patch
patching file ScriptBinding.py
patching file IOBinding.py

-------------

Run IDLE - Open cp1250.py - confirm converting to utf8 - F5 (immediately, no change in code!) - error: see attached file
idleunicode1.jpg

All the other python code base is clean 3.0.
xpsp3
msg79014 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-03 22:07
I see. Here is a revised patch. This sets the modified flag on the
buffer after conversion, so that you get asked to save it before running it.
msg79016 - (view) Author: Pavel Kosina (geon) Date: 2009-01-03 22:18
Yes. Goooood jooooob. ;-)
msg79052 - (view) Author: Pavel Kosina (geon) Date: 2009-01-04 13:17
With this file  - hello.py (attached) - I should be also asked for
converting to utf8. When I open it, nothing changes, after making
changes and saving then the encodings is my windows standard cp1250 ....
msg79055 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-04 13:38
> With this file  - hello.py (attached) - I should be also asked for
> converting to utf8.

Why that? This file is already encoded in utf-8 just fine. It is,
simultaneously, also encoded in ASCII, cp1250, cp1252, and nearly
any other encoding in use (as long as it is ASCII-based).

> When I open it, nothing changes, after making
> changes and saving then the encodings is my windows standard cp1250 ....

What did you do to find that out?
msg79060 - (view) Author: Pavel Kosina (geon) Date: 2009-01-04 14:54
Martin v. Löwis napsal(a), dne 4.1.2009 14:39:
> Why that? This file is already encoded in utf-8 just fine. It is,
> simultaneously, also encoded in ASCII, cp1250, cp1252, and nearly
> any other encoding in use (as long as it is ASCII-based).
>   

Well I am not much experienced but this file is not real utf8. It is 
encoded in ascii, cp1250, cp1252, and many other but not in utf8. utf8 
has a special flag, special bytes inside  - as a special mark for 
editors. That is what this file doesnt have. Even after making change in 
it in IDLE, it does not became real utf8. I always check it in another 
editor, that can work with encoding very well - PSPad.
msg79068 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-04 16:04
> utf8 has a special flag

No, it doesn't.

> as a special mark for editors.

That's the BOM, or UTF-8 signature. It's optional, and UTF-8-encoded
files typically do *not* have the UTF-8 signature.

> Even after making change in 
> it in IDLE, it does not became real utf8. 

Your understanding of UTF-8 is incorrect.
msg80120 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-18 20:26
Committed as r68732, r68733.
History
Date User Action Args
2022-04-11 14:56:43adminsetnosy: + benjamin.peterson
github: 49065
2009-01-18 20:26:16loewissetstatus: open -> closed
resolution: fixed
messages: + msg80120
2009-01-04 16:04:32loewissetmessages: + msg79068
2009-01-04 14:54:17geonsetmessages: + msg79060
2009-01-04 13:38:58loewissetmessages: + msg79055
2009-01-04 13:17:04geonsetfiles: + hello.py
messages: + msg79052
2009-01-03 22:18:10geonsetmessages: + msg79016
2009-01-03 22:07:36loewissetfiles: + conv.diff
messages: + msg79014
2009-01-03 21:49:48geonsetmessages: + msg79012
2009-01-03 21:40:03loewissetmessages: + msg79011
2009-01-03 21:33:37geonsetmessages: + msg79009
2009-01-03 21:24:14loewissetmessages: + msg79006
2009-01-03 21:09:05geonsetfiles: + idleunicode1.jpg
messages: + msg79004
2009-01-03 15:53:31loewissetmessages: + msg78972
2009-01-03 15:37:58geonsetmessages: + msg78970
2009-01-03 15:22:12loewissetmessages: + msg78967
2009-01-03 15:08:23geonsetmessages: + msg78966
2009-01-03 12:06:55loewissetfiles: - conv.diff
2009-01-03 12:06:47loewissetfiles: + conv.diff
messages: + msg78949
2009-01-03 10:36:12geonsetmessages: + msg78941
2009-01-03 08:55:07loewissetpriority: release blocker
keywords: + needs review
2009-01-03 08:54:39loewissetfiles: + conv.diff
keywords: + patch
2009-01-03 08:50:29geonsetmessages: + msg78940
2009-01-03 08:39:34loewissetmessages: + msg78939
2009-01-03 08:36:04geonsetmessages: + msg78938
2009-01-03 08:29:37geonsetmessages: + msg78937
2009-01-03 07:45:36loewissetnosy: + loewis
messages: + msg78936
2009-01-03 04:56:20geoncreate