classification
Title: nonASCII punctuation characters can not display in python363.chm.
Type: behavior Stage: resolved
Components: Documentation, Windows Versions: Python 3.7, Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Ma Lin, Sangbae Nam, docs@python, mcepl, mdk, miss-islington, paul.moore, r.david.murray, steve.dower, tim.golden, wwqgtxx, zaazbb, zach.ware
Priority: normal Keywords: patch

Created on 2017-11-30 03:45 by zaazbb, last changed 2018-10-19 15:04 by mcepl. This issue is now closed.

Files
File name Uploaded Description Edit
1512013191(1).jpg zaazbb, 2017-11-30 03:45
screenshot.PNG Ma Lin, 2018-03-11 11:04
QQ截图20180403085952.png wwqgtxx, 2018-04-03 01:01
QQ截图20180403090715.png wwqgtxx, 2018-04-03 01:08
py37chm.png Sangbae Nam, 2018-10-03 02:53 CHM on Windows 10 Japanese
PR 9758 effects.png Ma Lin, 2018-10-08 12:20
python3-doc-3-7-rc2-log.txt mcepl, 2018-10-18 21:28 build logs of python3-doc package on OpenSUSE/Tumbleweed
not perfect yet.png Ma Lin, 2018-10-19 03:01 navigation bar still has corrputed character
Pull Requests
URL Status Linked Edit
PR 9758 merged Ma Lin, 2018-10-08 11:21
PR 9762 merged miss-islington, 2018-10-08 21:21
PR 9763 merged miss-islington, 2018-10-08 21:21
Messages (21)
msg307277 - (view) Author: zaazbb (zaazbb) Date: 2017-11-30 03:45
In chm(python363.chm) documents, some unicode chars (non ascii chars) can not display.
for example:

asyncio — Asynchronous I/O, event loop, coroutines and tasks

displayed as

asyncio � Asynchronous I/O, event loop, coroutines and tasks

and 

Asynchronous programming is more complex than classical “sequential” programming

display as

Asynchronous programming is more complex than classical 搒equential� programming


windows 10, simplified chinese language.
python3.6.3, python363.chm.
msg307438 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-12-02 16:37
I'm not sure there will be any good fix for this. We might be able to coerce proper utf-8 output from Sphinx, and if it also adds the encoding tags required by whatever ancient version of Internet Explorer is used then it should be fine

It's likely just best to avoid special punctuation in doc source files though.
msg307460 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-02 21:26
The doc source files do not contain smart quotes, and as far as I know, sphinx does produce correct utf-8.

Recently there was a bug where incorrect smart quotes were leaking out of the internationalization of the docs, so this might be a problem that is already fixed.  On the other hand, there might be something broken about the chm production process.  I have no idea who would be the right person to investigate that, since I think Steve just spins the wheel on existing tools to get them generated :)

On the gripping hand, could there be something broken about your local charset configuration?  Does anyone else see this problem?
msg309671 - (view) Author: wwq (wwqgtxx) Date: 2018-01-08 15:56
I found the problem was not fixed on python364.chm but it show well on python362.chm, maybe the python.org official config was a change to let the coding error.
msg313596 - (view) Author: Ma Lin (Ma Lin) * Date: 2018-03-11 10:59
Here is a solution:
1, open a page(whatever) with Internet Explorer.
2, right click the page -> Encoding -> check "Auto-Select"
Then the wrong characters (�/抯) will disappear forever.

> Does anyone else see this problem?
Probably a lot of people have this problem.
I installed a clean Windows 10 recently, I believe it's the default visual effect of Python .chm document.
BTW my local is Simplified Chinese.
msg313637 - (view) Author: Ma Lin (Ma Lin) * Date: 2018-03-12 09:29
The source code of .chm changed between 3.6.2 and 3.6.3, the former uses escaped html entities.
I couldn't find out which commit caused this change.

3.6.2 chm: <h1>What&#8217;s New In Python 3.6</h1>
3.6.3 chm: <h1>What抯 New In Python 3.6</h1>

3.6.2 chm: <h2>Summary &#8211; Release highlights</h2>
3.6.3 chm: <h2>Summary ?Release highlights</h2>

Release date:
3.6.2 final: 2017-07-17
3.6.3 final: 2017-10-03
msg313763 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-03-13 16:33
We should probably prefer to force ASCII with explicit escapes (ideally named escapes, rather than codepoints). I'm not sure how to make Sphinx/docutils do that, but presumably it could be our own extension that handles the problematic characters people add to our docs.
msg314845 - (view) Author: wwq (wwqgtxx) Date: 2018-04-03 01:01
In python365.chm, it loss the style in any page but can show in IE.
msg314846 - (view) Author: wwq (wwqgtxx) Date: 2018-04-03 01:08
And in python365.chm, also had some non ascii chars can not display.
msg326931 - (view) Author: Sangbae Nam (Sangbae Nam) Date: 2018-10-03 02:53
This issue still persists in 3.6 and 3.7.
msg327247 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-06 14:30
Until someone creates and enables a Sphinx extension/option to only generate ASCII output, it will remain. Volunteers are welcome
msg327335 - (view) Author: Ma Lin (Ma Lin) * Date: 2018-10-08 09:38
I will create a PR to fix this within a day.
msg327344 - (view) Author: Ma Lin (Ma Lin) * Date: 2018-10-08 11:43
It seems impossible to specify the encoding of .chm to ASCII [1], the available encodings of .chm are limited to a list [2].

So I wrote a Sphinx extension for .chm output, it escapes the characters which (codepoint > 0x7F) to 7-bit ASCII. Most escaped characters are: “”’–…—

[1] https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/htmlhelp.py#L203-L206
[2] https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/htmlhelp.py#L136-L170
msg327371 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-08 21:21
New changeset 6261ae9b01fb8429b779169f8de37ff567c144e8 by Steve Dower (animalize) in branch 'master':
bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
https://github.com/python/cpython/commit/6261ae9b01fb8429b779169f8de37ff567c144e8
msg327372 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2018-10-08 21:21
Thanks, that looks perfect!
msg327373 - (view) Author: miss-islington (miss-islington) Date: 2018-10-08 21:26
New changeset 64bcedce8d61e1daa9ff7980cc07988574049b1f by Miss Islington (bot) in branch '3.6':
bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
https://github.com/python/cpython/commit/64bcedce8d61e1daa9ff7980cc07988574049b1f
msg327374 - (view) Author: miss-islington (miss-islington) Date: 2018-10-08 21:26
New changeset c4c86fad8024dc91af8d785c33187c092b4e49d9 by Miss Islington (bot) in branch '3.7':
bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
https://github.com/python/cpython/commit/c4c86fad8024dc91af8d785c33187c092b4e49d9
msg328007 - (view) Author: Matej Cepl (mcepl) * Date: 2018-10-18 21:28
It seems to me that this adds escape4chm as unconditional dependency on all platforms. Which seems like a bad idea to me, I don't think users on Linux or Mac OS X are that keen on *.chm files.

I think this change broke my build of python3-doc package on openSUSE (which built until now absolutely perfectly).

I don't even know, where the escape4chm plugin comes from. ???
msg328009 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2018-10-18 22:05
What version of Python are you running Sphinx with?  Your error is that `html.entities` does not exist, which makes it sound like Python 2; bump it to Python 3 and you'll be fine.
msg328018 - (view) Author: Ma Lin (Ma Lin) * Date: 2018-10-19 03:01
> It seems to me that this adds escape4chm as unconditional dependency on all platforms.

`Doc/Makefile` also includes `htmlhelp` command, it generates .chm materials.

BTW, the related navigation bar still has corrputed characters, see the attached file. This can't be fix via Sphinx extension.
I'm writing a Sphinx patch, if they accept it, we can revert this commit, a new option will sovle this perfectly: htmlhelp_ascii_output = True
msg328046 - (view) Author: Matej Cepl (mcepl) * Date: 2018-10-19 15:04
Sorry, my mistake, it seems I was using python2 Sphinx even for building python3 documentation, which is a bad idea, I guess.
History
Date User Action Args
2018-10-19 15:04:03mceplsetmessages: + msg328046
2018-10-19 03:01:12Ma Linsetfiles: + not perfect yet.png

messages: + msg328018
2018-10-18 22:05:30zach.waresetmessages: + msg328009
2018-10-18 21:28:20mceplsetfiles: + python3-doc-3-7-rc2-log.txt
nosy: + mcepl
messages: + msg328007

2018-10-08 23:18:03steve.dowersetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2018-10-08 21:26:58miss-islingtonsetmessages: + msg327374
2018-10-08 21:26:49miss-islingtonsetnosy: + miss-islington
messages: + msg327373
2018-10-08 21:21:26steve.dowersetmessages: + msg327372
2018-10-08 21:21:26miss-islingtonsetpull_requests: + pull_request9149
2018-10-08 21:21:19miss-islingtonsetpull_requests: + pull_request9148
2018-10-08 21:21:04steve.dowersetmessages: + msg327371
2018-10-08 12:20:00Ma Linsetfiles: + PR 9758 effects.png
2018-10-08 11:43:56Ma Linsetmessages: + msg327344
2018-10-08 11:21:20Ma Linsetkeywords: + patch
stage: patch review
pull_requests: + pull_request9143
2018-10-08 09:38:43Ma Linsetmessages: + msg327335
2018-10-06 14:30:54steve.dowersetmessages: + msg327247
2018-10-03 02:53:19Sangbae Namsetfiles: + py37chm.png

assignee: docs@python
components: + Documentation
versions: + Python 3.7
nosy: + Sangbae Nam, docs@python

messages: + msg326931
2018-04-03 01:08:09wwqgtxxsetfiles: + QQ截图20180403090715.png

messages: + msg314846
2018-04-03 01:01:36wwqgtxxsetfiles: + QQ截图20180403085952.png

messages: + msg314845
2018-03-13 16:33:47steve.dowersetmessages: + msg313763
2018-03-12 09:29:17Ma Linsetmessages: + msg313637
2018-03-11 11:04:58Ma Linsetfiles: + screenshot.PNG
2018-03-11 10:59:32Ma Linsetnosy: + Ma Lin
messages: + msg313596
2018-01-08 15:56:20wwqgtxxsetnosy: + wwqgtxx
messages: + msg309671
2017-12-02 21:31:29ned.deilysetnosy: + mdk
2017-12-02 21:26:00r.david.murraysetnosy: + r.david.murray
messages: + msg307460
2017-12-02 16:37:37steve.dowersetmessages: + msg307438
2017-11-30 03:45:30zaazbbcreate