This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: IDLE: Revise html to tkinker converter for help.html
Type: behavior Stage: needs patch
Components: IDLE Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: cheryl.sabella, markroseman, mdk, terry.reedy
Priority: normal Keywords:

Created on 2019-06-16 01:51 by terry.reedy, last changed 2022-04-11 14:59 by admin.

Messages (4)
msg345722 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-06-16 01:51
Sphinx 2.? generates different html than 1.8 such that the display of  
Help ==> IDLE Help has extra blank lines. Among possibly other things, the contents of <li>...</li> is wrapped in <p>...</p> and blank lines appear between the bullet and text.
 
<ul class="simple">
-<li>coded in 100% pure Python, using the <a class="reference internal" href="tkinter.html#module-tkinter" title="tkinter: Interface to Tcl/Tk for graphical user interfaces"><code class="xref py py-mod docutils literal notranslate"><span class="pre">tkinter</span></code></a> GUI toolkit</li>
-<li>cross-platform: works mostly the same on Windows, Unix, and macOS</li>
...
+<li><p>coded in 100% pure Python, using the <a class="reference internal" href="tkinter.html#module-tkinter" title="tkinter: Interface to Tcl/Tk for graphical user interfaces"><code class="xref py py-mod docutils literal notranslate"><span class="pre">tkinter</span></code></a> GUI toolkit</p></li>
+<li><p>cross-platform: works mostly the same on Windows, Unix, and macOS</p></li>
...
 </ul>

A similar issue afflicts the menu, with blank lines between the menu item and the explanation.

The html original 3x/Doc/build/html/library/idle.html#index-0 looks normal in Firefox.  The html parser class in help.py needs to ignore <p> within <li>.  It should specify which version of Sphinx it is compatible with.

Do any of you have any idea what the html change might be about?  Is there something wrong with idle.rst?
msg346205 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-06-21 11:23
tl;dr I think it's a difference in the CSS for the HTML5 writer. 

----------------------------------------

In the HTMLTranslator class for docutils writer [1], I found the following docstring, specifically the line "The html5_polyglot writer solves this using CSS2.".

    """
    The html4css1 writer has been optimized to produce visually compact
    lists (less vertical whitespace).  HTML's mixed content models
    allow list items to contain "<li><p>body elements</p></li>" or
    "<li>just text</li>" or even "<li>text<p>and body
    elements</p>combined</li>", each with different effects.  It would
    be best to stick with strict body elements in list items, but they
    affect vertical spacing in older browsers (although they really
    shouldn't).
    The html5_polyglot writer solves this using CSS2.

    Here is an outline of the optimization:

    - Check for and omit <p> tags in "simple" lists: list items
      contain either a single paragraph, a nested simple list, or a
      paragraph followed by a nested simple list.  This means that
      this list can be compact:

          - Item 1.
          - Item 2.

      But this list cannot be compact:

          - Item 1.

            This second paragraph forces space between list items.

          - Item 2.

    - In non-list contexts, omit <p> tags on a paragraph if that
      paragraph is the only child of its parent (footnotes & citations
      are allowed a label first).

    - Regardless of the above, in definitions, table cells, field bodies,
      option descriptions, and list items, mark the first child with
      'class="first"' and the last child with 'class="last"'.  The stylesheet
      sets the margins (top & bottom respectively) to 0 for these elements.

    The ``no_compact_lists`` setting (``--no-compact-lists`` command-line
    option) disables list whitespace optimization.
    """

In the HTMLTranslator class for the base [2], I found this comment:
    # Do not omit <p> tags
    # --------------------
    #
    # The HTML4CSS1 writer does this to "produce
    # visually compact lists (less vertical whitespace)". This writer
    # relies on CSS rules for"visual compactness".
    #
    # * In XHTML 1.1, e.g. a <blockquote> element may not contain
    #   character data, so you cannot drop the <p> tags.
    # * Keeping simple paragraphs in the field_body enables a CSS
    #   rule to start the field-body on a new line if the label is too long
    # * it makes the code simpler.

Since both comments are a few years old, I think it's in the CSS.


[1] https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/docutils/docutils/writers/html4css1/__init__.py
[2] https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/docutils/docutils/writers/_html_base.py
msg346206 - (view) Author: Cheryl Sabella (cheryl.sabella) * (Python committer) Date: 2019-06-21 11:34
Adding on to my last post, it's not in the CSS, but it's that Sphinx 2.0 switches from a default of HTML4 to HTML5.  The docutils comments explain the difference between the two.

https://github.com/sphinx-doc/sphinx/commit/a3cdd465ecf018fa5213b6b2c1c4e495973a2896
msg346241 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-06-21 18:26
Thank you for the research, including the crucial commit!  What I understand from the quotes:

1. Sphinx 2 writes HTML5 by default.  The html5 writers always writes paragraphs because they are required by the xhtml used by html5.

2. Firefox, for instance, displays the result the same as before either because it either has the logic to avoid extra blank lines when reading html5 or because this is taken care of by revised css (this is unclear from the quotes). 

To deal with html5, our converter would have to ignore the <p>s that the html4 writer omitted, by adding logic for the cases used in idle.rst.  Not fun.

Reading the commit (3rd line) revealed a new sphinx configuration option: html4_writer, defaulting to False.  When I switched from building html with my 3.6 install with sphinx 1.8.1 to 3.7 with 2.something, and added "-D html4_writer=1" to a direct call of sphinx-build, I indeed got html without added <p>s.  The only different was the irrelevant omission of '\n' between list item header and text in the html file.  Example:
  -<dt>New File</dt>
  -<dd>Create a new file editing window.</dd>
  +<dt>New File</dt><dd>Create a new file editing window.</dd>

Setting SPHINXOPTS should work when using 'Doc/make.bat html'.  I will prepare a PR documenting our parser requirement and include the neutral html changes.
History
Date User Action Args
2022-04-11 14:59:16adminsetgithub: 81479
2019-06-21 18:26:38terry.reedysetmessages: + msg346241
2019-06-21 11:34:16cheryl.sabellasetmessages: + msg346206
2019-06-21 11:23:13cheryl.sabellasetmessages: + msg346205
2019-06-16 01:51:12terry.reedycreate