Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nonASCII punctuation characters can not display in python363.chm. #76355

Closed
zaazbb mannequin opened this issue Nov 30, 2017 · 21 comments
Closed

nonASCII punctuation characters can not display in python363.chm. #76355

zaazbb mannequin opened this issue Nov 30, 2017 · 21 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir OS-windows type-bug An unexpected behavior, bug, or error

Comments

@zaazbb
Copy link
Mannequin

zaazbb mannequin commented Nov 30, 2017

BPO 32174
Nosy @pfmoore, @tjguk, @mcepl, @bitdancer, @zware, @zooba, @zaazbb, @animalize, @JulienPalard, @wwqgtxx, @miss-islington
PRs
  • bpo-32174: Let .chm document display non-ASCII characters properly #9758
  • [3.7] bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758) #9762
  • [3.6] bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758) #9763
  • Files
  • 1512013191(1).jpg
  • screenshot.PNG
  • QQ截图20180403085952.png
  • QQ截图20180403090715.png
  • py37chm.png: CHM on Windows 10 Japanese
  • PR 9758 effects.png
  • python3-doc-3-7-rc2-log.txt: build logs of python3-doc package on OpenSUSE/Tumbleweed
  • not perfect yet.png: navigation bar still has corrputed character
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-10-08.23:18:03.609>
    created_at = <Date 2017-11-30.03:45:30.123>
    labels = ['type-bug', '3.7', 'OS-windows', 'docs']
    title = 'nonASCII punctuation characters can not display in python363.chm.'
    updated_at = <Date 2018-10-19.15:04:02.999>
    user = 'https://github.com/zaazbb'

    bugs.python.org fields:

    activity = <Date 2018-10-19.15:04:02.999>
    actor = 'mcepl'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2018-10-08.23:18:03.609>
    closer = 'steve.dower'
    components = ['Documentation', 'Windows']
    creation = <Date 2017-11-30.03:45:30.123>
    creator = 'zaazbb'
    dependencies = []
    files = ['47304', '47476', '47514', '47515', '47844', '47858', '47881', '47883']
    hgrepos = []
    issue_num = 32174
    keywords = ['patch']
    message_count = 21.0
    messages = ['307277', '307438', '307460', '309671', '313596', '313637', '313763', '314845', '314846', '326931', '327247', '327335', '327344', '327371', '327372', '327373', '327374', '328007', '328009', '328018', '328046']
    nosy_count = 13.0
    nosy_names = ['paul.moore', 'tim.golden', 'mcepl', 'r.david.murray', 'docs@python', 'zach.ware', 'steve.dower', 'zaazbb', 'malin', 'mdk', 'wwqgtxx', 'miss-islington', 'Sangbae Nam']
    pr_nums = ['9758', '9762', '9763']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue32174'
    versions = ['Python 3.6', 'Python 3.7']

    @zaazbb
    Copy link
    Mannequin Author

    zaazbb mannequin commented Nov 30, 2017

    In chm(python363.chm) documents, some unicode chars (non ascii chars) can not display.
    for example:

    asyncio — Asynchronous I/O, event loop, coroutines and tasks

    displayed as

    asyncio � Asynchronous I/O, event loop, coroutines and tasks

    and

    Asynchronous programming is more complex than classical “sequential” programming

    display as

    Asynchronous programming is more complex than classical 搒equential� programming

    windows 10, simplified chinese language.
    python3.6.3, python363.chm.

    @zaazbb zaazbb mannequin added OS-windows type-bug An unexpected behavior, bug, or error labels Nov 30, 2017
    @zooba
    Copy link
    Member

    zooba commented Dec 2, 2017

    I'm not sure there will be any good fix for this. We might be able to coerce proper utf-8 output from Sphinx, and if it also adds the encoding tags required by whatever ancient version of Internet Explorer is used then it should be fine

    It's likely just best to avoid special punctuation in doc source files though.

    @bitdancer
    Copy link
    Member

    The doc source files do not contain smart quotes, and as far as I know, sphinx does produce correct utf-8.

    Recently there was a bug where incorrect smart quotes were leaking out of the internationalization of the docs, so this might be a problem that is already fixed. On the other hand, there might be something broken about the chm production process. I have no idea who would be the right person to investigate that, since I think Steve just spins the wheel on existing tools to get them generated :)

    On the gripping hand, could there be something broken about your local charset configuration? Does anyone else see this problem?

    @wwqgtxx
    Copy link
    Mannequin

    wwqgtxx mannequin commented Jan 8, 2018

    I found the problem was not fixed on python364.chm but it show well on python362.chm, maybe the python.org official config was a change to let the coding error.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Mar 11, 2018

    Here is a solution:
    1, open a page(whatever) with Internet Explorer.
    2, right click the page -> Encoding -> check "Auto-Select"
    Then the wrong characters (�/抯) will disappear forever.

    Does anyone else see this problem?
    Probably a lot of people have this problem.
    I installed a clean Windows 10 recently, I believe it's the default visual effect of Python .chm document.
    BTW my local is Simplified Chinese.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Mar 12, 2018

    The source code of .chm changed between 3.6.2 and 3.6.3, the former uses escaped html entities.
    I couldn't find out which commit caused this change.

    3.6.2 chm: <h1>What&bpo-8217;s New In Python 3.6</h1>
    3.6.3 chm: <h1>What抯 New In Python 3.6</h1>

    3.6.2 chm: <h2>Summary &bpo-8211; Release highlights</h2>
    3.6.3 chm: <h2>Summary ?Release highlights</h2>

    Release date:
    3.6.2 final: 2017-07-17
    3.6.3 final: 2017-10-03

    @zooba
    Copy link
    Member

    zooba commented Mar 13, 2018

    We should probably prefer to force ASCII with explicit escapes (ideally named escapes, rather than codepoints). I'm not sure how to make Sphinx/docutils do that, but presumably it could be our own extension that handles the problematic characters people add to our docs.

    @wwqgtxx
    Copy link
    Mannequin

    wwqgtxx mannequin commented Apr 3, 2018

    In python365.chm, it loss the style in any page but can show in IE.

    @wwqgtxx
    Copy link
    Mannequin

    wwqgtxx mannequin commented Apr 3, 2018

    And in python365.chm, also had some non ascii chars can not display.

    @SangbaeNam
    Copy link
    Mannequin

    SangbaeNam mannequin commented Oct 3, 2018

    This issue still persists in 3.6 and 3.7.

    @SangbaeNam SangbaeNam mannequin added docs Documentation in the Doc dir 3.7 (EOL) end of life labels Oct 3, 2018
    @SangbaeNam SangbaeNam mannequin assigned docspython Oct 3, 2018
    @zooba
    Copy link
    Member

    zooba commented Oct 6, 2018

    Until someone creates and enables a Sphinx extension/option to only generate ASCII output, it will remain. Volunteers are welcome

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Oct 8, 2018

    I will create a PR to fix this within a day.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Oct 8, 2018

    It seems impossible to specify the encoding of .chm to ASCII [1], the available encodings of .chm are limited to a list [2].

    So I wrote a Sphinx extension for .chm output, it escapes the characters which (codepoint > 0x7F) to 7-bit ASCII. Most escaped characters are: “”’–…—

    [1] https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/htmlhelp.py#L203-L206
    [2] https://github.com/sphinx-doc/sphinx/blob/master/sphinx/builders/htmlhelp.py#L136-L170

    @zooba
    Copy link
    Member

    zooba commented Oct 8, 2018

    New changeset 6261ae9 by Steve Dower (animalize) in branch 'master':
    bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
    6261ae9

    @zooba
    Copy link
    Member

    zooba commented Oct 8, 2018

    Thanks, that looks perfect!

    @miss-islington
    Copy link
    Contributor

    New changeset 64bcedc by Miss Islington (bot) in branch '3.6':
    bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
    64bcedc

    @miss-islington
    Copy link
    Contributor

    New changeset c4c86fa by Miss Islington (bot) in branch '3.7':
    bpo-32174: Let .chm document display non-ASCII characters properly (GH-9758)
    c4c86fa

    @zooba zooba closed this as completed Oct 8, 2018
    @mcepl
    Copy link
    Mannequin

    mcepl mannequin commented Oct 18, 2018

    It seems to me that this adds escape4chm as unconditional dependency on all platforms. Which seems like a bad idea to me, I don't think users on Linux or Mac OS X are that keen on *.chm files.

    I think this change broke my build of python3-doc package on openSUSE (which built until now absolutely perfectly).

    I don't even know, where the escape4chm plugin comes from. ???

    @zware
    Copy link
    Member

    zware commented Oct 18, 2018

    What version of Python are you running Sphinx with? Your error is that html.entities does not exist, which makes it sound like Python 2; bump it to Python 3 and you'll be fine.

    @animalize
    Copy link
    Mannequin

    animalize mannequin commented Oct 19, 2018

    It seems to me that this adds escape4chm as unconditional dependency on all platforms.

    [Doc/Makefile](https://github.com/python/cpython/blob/main/Doc/Makefile) also includes htmlhelp command, it generates .chm materials.

    BTW, the related navigation bar still has corrputed characters, see the attached file. This can't be fix via Sphinx extension.
    I'm writing a Sphinx patch, if they accept it, we can revert this commit, a new option will sovle this perfectly: htmlhelp_ascii_output = True

    @mcepl
    Copy link
    Mannequin

    mcepl mannequin commented Oct 19, 2018

    Sorry, my mistake, it seems I was using python2 Sphinx even for building python3 documentation, which is a bad idea, I guess.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir OS-windows type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants