This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: smart quotes in Lib/pydoc_data/topics.py file
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lukasz.langa, ned.deily, serhiy.storchaka, xtreak
Priority: normal Keywords:

Created on 2020-08-12 06:31 by xtreak, last changed 2022-04-11 14:59 by admin.

Messages (3)
msg375213 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-08-12 06:31
Similar to issue41525 the generated file seems to contain smart quotes. This is used in help utility of the repl to explore different topics.

git log -G'“' Lib/pydoc_data/topics.py | cat
commit bc1c8af8ef2563802767404c78c8ec6d6a967897
Author: Łukasz Langa <lukasz@langa.pl>
Date:   Mon Apr 27 22:44:04 2020 +0200

    Python 3.9.0a6

commit fd757083df79c21eee862e8d89aeefefe45f64a0
Author: Łukasz Langa <lukasz@langa.pl>
Date:   Tue Nov 19 12:17:21 2019 +0100

    Python 3.9.0a1

commit aab0e57045f6badaa1404409626545785ef02d62
Author: Łukasz Langa <lukasz@langa.pl>
Date:   Sun Feb 3 14:04:12 2019 +0100

    [pydoc] Regenerate topics for v3.8.0a1
msg375214 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-08-12 07:23
Pydoc uses the backslashreplace error handler for characters not encodable with the output encoding (see issue21398 and issue23374).

$ LC_ALL=uk_UA.koi8-u ./python -c "help('async')"
[...]

[2] A string literal appearing as the first statement in the
    function body is transformed into the function\u2019s "__doc__"
    attribute and therefore the function\u2019s *docstring*.

[3] A string literal appearing as the first statement in the class
    body is transformed into the namespace\u2019s "__doc__" item and
    therefore the class\u2019s *docstring*.


It would be better to replace non-ASCII quotation marks and dashes with corresponding ASCII quotation marks and hyphen-minus if they cannot be encoded. It may be a part of more general feature for transliterating non-ASCII characters to ASCII.
msg375429 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2020-08-14 19:05
Sorry, my previous response was incomplete and I closed this prematurely. Re-opening.
History
Date User Action Args
2022-04-11 14:59:34adminsetgithub: 85699
2020-08-14 19:05:29ned.deilysetmessages: - msg375424
2020-08-14 19:05:18ned.deilysetstatus: closed -> open
superseder: Python '--help' has corrupted text. ->
messages: + msg375429

resolution: duplicate ->
stage: resolved ->
2020-08-14 18:54:30ned.deilysetstatus: open -> closed

superseder: Python '--help' has corrupted text.

nosy: + ned.deily
messages: + msg375424
resolution: duplicate
stage: resolved
2020-08-12 07:23:30serhiy.storchakasetmessages: + msg375214
2020-08-12 06:31:06xtreakcreate