Author terry.reedy
Recipients BreamoreBoy, ezio.melotti, josephoenix, orsenthil, r.david.murray, terry.reedy
Date 2013-03-20.02:57:36
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1363748257.59.0.189145622848.issue2052@psf.upfronthosting.co.za>
In-reply-to
Content
In 3.2, it is line 1629:
          content="text/html; charset=ISO-8859-1" />

That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly.

For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as  'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'c㌳'. To check:
>>> 'c㌳'.encode().decode(encoding='Latin-1')
'cã\x8c³'

To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out.
History
Date User Action Args
2013-03-20 02:57:37terry.reedysetrecipients: + terry.reedy, orsenthil, josephoenix, ezio.melotti, r.david.murray, BreamoreBoy
2013-03-20 02:57:37terry.reedysetmessageid: <1363748257.59.0.189145622848.issue2052@psf.upfronthosting.co.za>
2013-03-20 02:57:37terry.reedylinkissue2052 messages
2013-03-20 02:57:36terry.reedycreate