Message 184726 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	BreamoreBoy, ezio.melotti, josephoenix, orsenthil, r.david.murray, terry.reedy
Date	2013-03-20.02:57:36
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1363748257.59.0.189145622848.issue2052@psf.upfronthosting.co.za>
In-reply-to

Content
In 3.2, it is line 1629: content="text/html; charset=ISO-8859-1" /> That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly. For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as 'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'cãŒ³'. To check: >>> 'c㌳'.encode().decode(encoding='Latin-1') 'cã\x8c³' To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out.

In 3.2, it is line 1629:
          content="text/html; charset=ISO-8859-1" />

That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly.

For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as  'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'cãŒ³'. To check:
>>> 'c㌳'.encode().decode(encoding='Latin-1')
'cã\x8c³'

To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out.

History
Date	User	Action	Args
2013-03-20 02:57:37	terry.reedy	set	recipients: + terry.reedy, orsenthil, josephoenix, ezio.melotti, r.david.murray, BreamoreBoy
2013-03-20 02:57:37	terry.reedy	set	messageid: <1363748257.59.0.189145622848.issue2052@psf.upfronthosting.co.za>
2013-03-20 02:57:37	terry.reedy	link	issue2052 messages
2013-03-20 02:57:36	terry.reedy	create