classification
Title: Links for French documentation PDF is broken: LaTeX issue with non-ASCII characters?
Type: behavior Stage: patch review
Components: Documentation Versions: Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: benjamin.peterson, docs@python, ezio.melotti, fabrice, jfbu, linkid, mdk, ned.deily, vstinner
Priority: normal Keywords: patch

Created on 2017-09-26 08:59 by fabrice, last changed 2017-12-03 14:24 by jfbu.

Pull Requests
URL Status Linked Edit
PR 3940 merged python-dev, 2017-10-10 06:14
PR 4069 open jfbu, 2017-10-21 18:07
PR 4683 merged python-dev, 2017-12-02 22:24
Messages (22)
msg303024 - (view) Author: fabrice (fabrice) Date: 2017-09-26 08:59
Hi,

In this page : https://docs.python.org/fr/2/download.html, all the documentation links aren't  available, for example : 

- https://docs.python.org/fr/2/archives/python-2.7.14-docs-pdf-a4.zip
- https://docs.python.org/fr/2/archives/python-2.7.14-docs-pdf-a4.tar.bz2

So, I can't read the documentation in French :)

Regards,
Fabrice
msg303028 - (view) Author: Julien Palard (mdk) * Date: 2017-09-26 09:58
Hi Fabrice,

Thanks for reporting.

The whole archives/ directory is completly missing for french and japanese, I'll take a look.

Bug is probably really near https://github.com/python/docsbuild-scripts/blob/master/build_docs.py#L222
msg303498 - (view) Author: Julien Palard (mdk) * Date: 2017-10-01 21:52
Problem happen during pdflatex, I tried a local build and got: 

! Package hyperref Error: Wrong DVI mode driver option `dvipdfmx',
(hyperref)                because pdfTeX or LuaTeX is running in PDF mode.

See the hyperref package documentation for explanation.
Type  H <return>  for immediate help.
 ...                                              
                                                  
l.4362 \ProcessKeyvalOptions{Hyp}
                                 
? 
! Emergency stop.
msg303501 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2017-10-01 22:36
FWIW most of the errors I met while trying to build the pdfs of the main docs were caused by the presence of non-latin1 characters.  French should be limited to the latin1 range and the error you pasted doesn't seem to be related, however that might explain while the Japanese docs are also missing (unless this issue got fixed in the meanwhile -- I haven't built the pdfs in a while).
msg303502 - (view) Author: Julien Palard (mdk) * Date: 2017-10-01 23:11
After an upgrade of my venv, the error is now:

Latexmk: applying rule 'pdflatex'...
Rule 'pdflatex': File changes, etc:
   Changed files, or newly in use since previous run(s):
      'faq.aux'
      'faq.out'
      'faq.toc'
Latexmk: Maximum runs of pdflatex reached without getting stable files
Latexmk: All targets (faq.pdf) are up-to-date
Latexmk: Did not finish processing file 'faq.tex':
   'pdflatex' needed too many passes
msg303503 - (view) Author: Julien Palard (mdk) * Date: 2017-10-01 23:32
One difference I see in the logs of sucessfully building the faq.tex and failing to build it is:

   Package hyperref Warning: Token not allowed in a PDF string (Unicode):

Looks like if I remove all non-ascii characters from titles, it builds again.
msg303562 - (view) Author: Julien Palard (mdk) * Date: 2017-10-02 19:43
Problem looks like the utf8x package is not friend with tableofcontent:

- https://tex.stackexchange.com/questions/240801/utf8x-character-fails-in-the-table-of-contents-every-second-time-i-compile
- https://tex.stackexchange.com/questions/164458/pleaseinsertintopreamble-in-toc-and-header
msg303586 - (view) Author: Julien Palard (mdk) * Date: 2017-10-03 08:55
For the record, I can reproduce the issue with this minimal test file:

mdk@windhowl$ ls -lah
total 108K
drwxr-xr-x  2 mdk  mdk  4.0K Oct  2 21:15 .
drwxrwxrwt 18 root root  96K Oct  2 21:15 ..
-rw-r--r--  1 mdk  mdk   196 Oct  2 21:13 faq.tex

mdk@windhowl$ cat faq.tex 
\documentclass[a4,10pt,french]{report}

\usepackage[utf8x]{inputenc}
\usepackage[T1,T2A]{fontenc}
\usepackage{babel}

\begin{document}
\tableofcontents
\chapter{FAQ sur Python éh}
\end{document}

mdk@windhowl$ latexmk faq 2>&1 | tail -n 15
(/usr/share/texlive/texmf-dist/tex/latex/ucs/data/uni-0.def) [2] (./faq.aux) )
Output written on faq.dvi (2 pages, 608 bytes).
Transcript written on faq.log.
Latexmk: Log file says output to 'faq.dvi'
Rule 'latex': File changes, etc:
   Changed files, or newly in use since previous run(s):
      'faq.aux'
      'faq.toc'
Latexmk: Maximum runs of latex reached without getting stable files
Latexmk: Did not finish processing file 'faq':
   'latex' needed too many passes
Latexmk: Use the -f option to force complete processing,
 unless error was exceeding maximum runs of latex/pdflatex.
Latexmk: applying rule 'latex'...
Latexmk: All targets (faq.dvi) are up-to-date


Also, according to matrixise, it works with xelatex.
msg303587 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-03 09:34
Julien: Is there any reason why you want to use utf8x in specific?

https://tex.stackexchange.com/questions/13067/utf8x-vs-utf8-inputenc

"""
utf8x vs. utf8 (inputenc)
(...)
The simple answer is that utf8x is to be avoided if possible. It loads the ucs package, which for a long time was unmaintained (although there is now a new maintainer) and breaks various other things.
"""
msg303590 - (view) Author: Julien Palard (mdk) * Date: 2017-10-03 09:47
I personally do not care about using utf8x, it has been introduced in:

  r74549 | benjamin.peterson | 2009-08-24 12:42:36 -0500 (Mon, 24 Aug 2009) | 1 line
  fix pdf building by teaching latex the right encoding package

  # Get LaTeX to handle Unicode correctly
  latex_elements = {'inputenc': r'\usepackage[utf8x]{inputenc}'}

I tried with utf8 instead and it yielded a different set of errors, so I did not tried much. But utf8-induced bugs may be easier to fix that utf8x-induced bugs, I don't know.
msg303592 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2017-10-03 09:54
(Oh sorry, I misunderstand what you wrote. I understood that Python didn't use utf8x yet and you proposed to start using it.)
msg304016 - (view) Author: Julien Palard (mdk) * Date: 2017-10-10 06:10
Today I tried without utf8x and an up-to-date version of texlive, errors looked less mystical, so I was able to open a readable / easy to reproduce issue:

https://github.com/sphinx-doc/sphinx/issues/4136

I then followed the idea of trying xetex and I was able to build english and french PDF (still not had time to try japanese).
msg304499 - (view) Author: Julien Palard (mdk) * Date: 2017-10-17 11:58
Finally have a combination of latex engines that work for every cases, did a PR on docsbuild-scripts:

https://github.com/python/docsbuild-scripts/pull/34/

I already pulled xelatex on docs.iad1.psf.io via https://github.com/python/psf-salt/commit/989a7715c4a452b5af13baf9a33535bab0af822b#diff-6fb01fe8bbc22a54d234a57ad58e291e

Also opened a few related issue sphinx-doc side to track my thoughts:

- https://github.com/sphinx-doc/sphinx/issues/4150 (Ask if there is a better way to set the latex engine than giving it from docsbuild-scripts)
- https://github.com/sphinx-doc/sphinx/issues/4159 (Choose unicode-enabled default latex engines)
- https://github.com/sphinx-doc/sphinx/issues/4149 (Be more explicit on documentation about how to choose a latex engine)
msg304717 - (view) Author: jfbu (jfbu) * Date: 2017-10-21 18:44
I have made a PR at https://github.com/python/cpython/pull/4069 which enhances `conf.py` with some pdflatex extra Unicode configuration. I tested it with building PDF English documentation at master (at https://github.com/python/cpython/tree/db60a5bfa5d5f7a6f1538cc1fe76f0fda57b524e) and at 3.6 branch (at https://github.com/python/cpython/tree/1e78ed6825701029aa45a68f9e62dd3bb8d4e928) and also French documentation at 3.6 (at https://github.com/python/python-docs-fr/commit/76b522b79c3caa26658920c714acf8fac0c20eeb). The changes are only for ``pdflatex`` builds: if `latex_engine` is set to `xelatex`, `lualatex`, or `platex` (automatic if language is `ja`), nothing is modified.
msg307469 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-12-02 22:24
New changeset 7324b5ce8e7c031a0a3832a6a8d7c639111ae0ff by Ned Deily (Julien Palard) in branch 'master':
bpo-31589 : Build PDF using xelatex for better UTF8 support. (#3940)
https://github.com/python/cpython/commit/7324b5ce8e7c031a0a3832a6a8d7c639111ae0ff
msg307472 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2017-12-02 22:35
New changeset 2ad350a713360e89ae6d264924cd28f519b8b22c by Ned Deily (Miss Islington (bot)) in branch '3.6':
[3.6] bpo-31589 : Build PDF using xelatex for better UTF8 support. (GH-3940) (#4683)
https://github.com/python/cpython/commit/2ad350a713360e89ae6d264924cd28f519b8b22c
msg307490 - (view) Author: Julien Palard (mdk) * Date: 2017-12-03 08:41
Due to issue 32200, we switched on xelatex, it however did **not** fixed the french builds as expected, maybe because we're using an old version of xelatex.

The issue:

    ! Improper discretionary list.
    <recently read> }
                
    l.359 ...{PyObject} \PYG{o}{*}\PYG{n}{t}\PYG{p}{;}
                                                  
    ? 
    ! Emergency stop. 

The version of texlive-xetex on the build server: 2013.20140215-1ubuntu0.1

The version I tried: 2017.20171128-1

I'm not having the 2013 version on Debian. I'm trying now with 2014.20141024-2+deb8u1, and I'm trying to understand the error message too.
msg307493 - (view) Author: jfbu (jfbu) * Date: 2017-12-03 09:09
I can confirm the "Improper discretionary list" error from xetex build is a xetex bug which is present at XeTeX 0.99992 and absent at XeTeX 0.99996 and presumably all more recent releases.

It was seen at https://github.com/sphinx-doc/sphinx/issues/3546 and reported to XeTeX mailing list at http://tug.org/pipermail/xetex/2017-March/027056.html
msg307502 - (view) Author: Julien Palard (mdk) * Date: 2017-12-03 10:43
XeTeX 0.99996 was released in march 2016, so it's not even in Ubuntu 16.04 Xenial Xerus. On docs.iad1.psf.io we're having Ubuntu 14.04 (an LTS ending around february 2019).
msg307506 - (view) Author: jfbu (jfbu) * Date: 2017-12-03 11:27
On-going discussion at http://tug.org/pipermail/xetex/2017-December/027212.html has brought new element that polyglossia's French module is broken with xetex since TeXLive2016. We had only one problem, we now have two on our hands.

Possibly Sphinx could be default use babel + French, not polyglossia + French, as the former is maintained but apparently less so the latter.

I tested that TeXLive 2015 (fully updated) and test document showing the https://github.com/sphinx-doc/sphinx/issues/3546 problem now compiles fine if using 

latex_elements = {
    'babel': r'\usepackage{babel}',
}

in conf.py file, to override polyglossia which is default for Sphinx with xelatex.
msg307507 - (view) Author: Julien Palard (mdk) * Date: 2017-12-03 11:42
For me, french compile correctly with current state of cpython's conf.py and texlive-xetex 2017.20171128-1, if it helps.

I tried (not enough) and fail to test locally with a texlive-xetex from 2013 (I should try with a VM maybe, Debian won't let me install packages from 2013 on my sid...).
msg307515 - (view) Author: jfbu (jfbu) * Date: 2017-12-03 14:24
Related https://github.com/sphinx-doc/sphinx/issues/4272

It is stated there that using babel-french in place of polyglossia-french avoids the "Improper discretionary list" xetex problem starting with xetex 0.99992 (i.e. TeXLive 2015) whereas with polyglossia-french the earliest xetex version I could test with success is 0.99996 (TL2016). But starting with TL2016, polyglossia-french as the issue https://github.com/sphinx-doc/sphinx/issues/4272

With TeXLive 2014, using babel-french does not avoid the "Improper discretionary list" xetex problem. I don't know how this maps to Debian packaging. One needs xetex 0.99992 at minimum.
History
Date User Action Args
2017-12-03 14:24:32jfbusetmessages: + msg307515
2017-12-03 11:42:15mdksetmessages: + msg307507
2017-12-03 11:27:17jfbusetmessages: + msg307506
2017-12-03 10:43:10mdksetmessages: + msg307502
2017-12-03 09:09:20jfbusetmessages: + msg307493
2017-12-03 08:41:19mdksetmessages: + msg307490
2017-12-02 22:35:10ned.deilysetmessages: + msg307472
2017-12-02 22:24:55python-devsetpull_requests: + pull_request4595
2017-12-02 22:24:41ned.deilysetnosy: + ned.deily
messages: + msg307469
2017-10-21 18:44:40jfbusetnosy: + jfbu
messages: + msg304717
2017-10-21 18:07:39jfbusetpull_requests: + pull_request4040
2017-10-17 11:58:28mdksetmessages: + msg304499
2017-10-10 06:14:15python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request3913
2017-10-10 06:10:12mdksetmessages: + msg304016
2017-10-03 09:54:44vstinnersetmessages: + msg303592
2017-10-03 09:47:48mdksetmessages: + msg303590
2017-10-03 09:35:08vstinnersettitle: Links for French documentation pdf is broken -> Links for French documentation PDF is broken: LaTeX issue with non-ASCII characters?
2017-10-03 09:34:49vstinnersetnosy: + vstinner
messages: + msg303587
2017-10-03 08:55:57mdksetmessages: + msg303586
2017-10-02 19:43:57mdksetnosy: + benjamin.peterson
messages: + msg303562
2017-10-01 23:32:44mdksetmessages: + msg303503
2017-10-01 23:11:49mdksetmessages: + msg303502
2017-10-01 22:36:48ezio.melottisettype: behavior

messages: + msg303501
nosy: + ezio.melotti
2017-10-01 21:54:28mdksetnosy: + linkid
2017-10-01 21:52:17mdksetmessages: + msg303498
2017-09-26 09:58:20mdksetmessages: + msg303028
2017-09-26 09:18:47vstinnersetnosy: + mdk
2017-09-26 08:59:09fabricecreate