classification
Title: IDLE fails to display the README file
Type: behavior Stage: resolved
Components: IDLE Versions: Python 3.6, Python 3.5, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: Rosuav, kbk, larry, python-dev, roger.serwy, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2015-12-18 12:29 by serhiy.storchaka, last changed 2016-01-17 04:54 by terry.reedy. This issue is now closed.

Messages (6)
msg256679 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-18 12:29
When open the About IDLE dialog and press the README button on bottom line, I get the UnicodeDecodeError:

Exception in Tkinter callback
Traceback (most recent call last):
  File "/home/serhiy/py/cpython/Lib/tkinter/__init__.py", line 1548, in __call__
    return self.func(*args)
  File "/home/serhiy/py/cpython/Lib/idlelib/aboutDialog.py", line 127, in ShowIDLEAbout
    self.display_file_text('About - Readme', 'README.txt')
  File "/home/serhiy/py/cpython/Lib/idlelib/aboutDialog.py", line 139, in display_file_text
    textView.view_file(self, title, fn, encoding)
  File "/home/serhiy/py/cpython/Lib/idlelib/textView.py", line 74, in view_file
    contents = file.read()
  File "/home/serhiy/py/cpython/Lib/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 72: invalid start byte

That is because IDLE opens the Lib/idlelib/README.txt file with default (locale) encoding, while it contains the RIGHT SINGLE QUOTATION MARK character encoded with the CP1252 encoding and non-decodable with UTF-8.

I think IDLE should open all distributed files with UTF-8 encoding. Lib/idlelib/CREDITS.txt and Lib/idlelib/README.txt should be recoded to UTF-8.
msg256706 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-12-18 20:47
New changeset 11c789c034fe by Terry Jan Reedy in branch '2.7':
Issue #25905: Revert unwanted conversion of ' to ’ RIGHT SINGLE QUOTATION MARK.
https://hg.python.org/cpython/rev/11c789c034fe

New changeset 42963dd81600 by Terry Jan Reedy in branch '3.4':
Issue #25905: Revert unwanted conversion of ' to ’ RIGHT SINGLE QUOTATION MARK.
https://hg.python.org/cpython/rev/42963dd81600
msg256707 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-12-18 21:03
Strange and accidental.  This is the one line I kept from the old version, before deleting the old and adding the new.  I believe I edited either with IDLE or Notepad++, and I would be surprised if the latter, an editor for programmers, would turn ' into ’.  It does not for .py and .rst files.  README.txt should be Ascii-only.

Larry, I applied the README.txt reverse replacement to 3.4 in case you feel like cherry picking it into 3.4.4, but it is not a blocker.  No NEWS entries would be needed for 3.4 as this would fix an unreleased reversion.
msg256710 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-12-18 21:36
For CREDITS.txt, the only issue is 'Löwis' (ö has latin-1 code 246), which was changed from Loewis in 2003.  If not changed back to the ascii form, what matters is that the encoding used for decoding from bytes to unicode matches the actual encoding.  AFAIK, it currently does

def ShowIDLECredits(self):  # aboutDialog.py, line 130
    self.display_file_text('About - Credits', 'CREDITS.txt', 'iso-8859-1')

No encoding is given for README.txt, line 133, because none should be needed.  I am going to leave this alone for now.
msg258449 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-01-17 04:44
New changeset 21356a5b8a5e by Terry Jan Reedy in branch '2.7':
Issue #25905: Specify 'ascii' encoding for README.txt and NEWS.txt.
https://hg.python.org/cpython/rev/21356a5b8a5e

New changeset 59852a79b508 by Terry Jan Reedy in branch '3.5':
Issue #25905: Specify 'ascii' encoding for README.txt and NEWS.txt.
https://hg.python.org/cpython/rev/59852a79b508
msg258451 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2016-01-17 04:54
I re-encoded CREDITS to utf-8 and specified 'ascii' for the other so *I* will see an error if a non-ascii chars gets in the file again.  Serhiy, I am confident that this will work on all OSes, but feel free to test AboutIDLE again sometime.

(Note to myself) To automate a test that the files will open, change
from idlelib import textView  # to
from idlelib.textView import view_text, view_file
and change display... and show.. methods to module functions so view_file can be replaced and the show functions called without involving tkinter.  This can be part of future refactoring in 3.5+.
History
Date User Action Args
2016-01-17 04:54:24terry.reedysetmessages: + msg258451
2016-01-17 04:44:45python-devsetmessages: + msg258449
2015-12-18 21:36:55terry.reedysetstatus: open -> closed
resolution: fixed
messages: + msg256710

stage: needs patch -> resolved
2015-12-18 21:03:04terry.reedysetnosy: + larry
messages: + msg256707

assignee: terry.reedy
stage: needs patch
2015-12-18 20:47:51python-devsetnosy: + python-dev
messages: + msg256706
2015-12-18 12:29:52serhiy.storchakacreate