Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydoc 3.x raises UnicodeEncodeError on sqlite3 package #67563

Closed
smontanaro opened this issue Feb 1, 2015 · 12 comments
Closed

pydoc 3.x raises UnicodeEncodeError on sqlite3 package #67563

smontanaro opened this issue Feb 1, 2015 · 12 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@smontanaro
Copy link
Contributor

BPO 23374
Nosy @smontanaro, @bitdancer, @vadmium, @serhiy-storchaka
Files
  • pydoc_encoding.patch
  • pydoc_encoding_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-02-20.21:49:08.180>
    created_at = <Date 2015-02-01.19:24:18.724>
    labels = ['type-bug', 'library']
    title = 'pydoc 3.x raises UnicodeEncodeError on sqlite3 package'
    updated_at = <Date 2015-02-20.21:49:08.179>
    user = 'https://github.com/smontanaro'

    bugs.python.org fields:

    activity = <Date 2015-02-20.21:49:08.179>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-02-20.21:49:08.180>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2015-02-01.19:24:18.724>
    creator = 'skip.montanaro'
    dependencies = []
    files = ['37981', '38146']
    hgrepos = []
    issue_num = 23374
    keywords = ['patch']
    message_count = 12.0
    messages = ['235200', '235201', '235202', '235203', '235204', '235206', '235208', '235263', '236036', '236076', '236078', '236334']
    nosy_count = 5.0
    nosy_names = ['skip.montanaro', 'r.david.murray', 'python-dev', 'martin.panter', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue23374'
    versions = ['Python 3.4', 'Python 3.5']

    @smontanaro
    Copy link
    Contributor Author

    I'm probably doing something wrong, but I've tried everything I can think of
    without any success.

    In Python 2.7, the pydoc command successfully displays help for the sqlite3
    package, though it muffs the output of Gerhard Häring's name, spitting out
    the original Latin-1 spelling. In Python 3.x, I get a UnicodeEncodeError for
    my trouble, and it hoses my tty settings to boot, requiring a LF reset LF
    sequence to put right unless I set PAGER to "cat".

    Here's a sample run:

    % PAGER=cat pydoc3.5 sqlite3
    Traceback (most recent call last):
      File "/Users/skip/local/bin/pydoc3.5", line 5, in <module>
        pydoc.cli()
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 2591, in cli
        help.help(arg)
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 1874, in help
        elif request: doc(request, 'Help on %s:', output=self._output)
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 1612, in doc
        pager(render_doc(thing, title, forceload))
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 1412, in pager
        pager(text)
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 1428, in <lambda>
        return lambda text: pipepager(text, os.environ['PAGER'])
      File "/Users/skip/local/lib/python3.5/pydoc.py", line 1455, in pipepager
        pipe.write(text)
    UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 600: ordinal not in range(128)

    I understand the error, but I see no way to convince it to use any codec
    other than "ascii". Stuff I tried:

    • setting PYTHONIOENCODING to "UTF-8" (suggested by Peter Otten on c.l.py)
    • setting LANG to "en_US.utf8"

    This is on a Mac running Yosemite with pydoc invoked in Apple's Terminal
    app. Display is fine in my browser when I run pydoc as a web server.

    The source it is attempting to display has a coding cookie, so it should
    know that the code is encoded using Latin-1. The problem seems to all be
    about generating output.

    @smontanaro smontanaro added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels Feb 1, 2015
    @serhiy-storchaka
    Copy link
    Member

    What are sys.getfilesystemencoding(), locale.getpreferredencoding(False), os.popen('cat', 'w').encoding?

    @smontanaro
    Copy link
    Contributor Author

    Without setting any environment variables:

    >>> import sys
    >>> sys.getfilesystemencoding()
    'utf-8'
    >>> import locale
    >>> locale.getpreferredencoding(False)
    'US-ASCII'
    >>> import os
    >>> os.popen('cat', 'w').encoding
    'US-ASCII'

    If I set PYTHONIOENCODING=UTF-8:

    >>> import sys, locale, os
    >>> sys.getfilesystemencoding()
    'utf-8'
    >>> locale.getpreferredencoding(False)
    'US-ASCII'
    >>> os.popen('cat', 'w').encoding
    'US-ASCII'

    If I set LANG=en_US.utf8:

    >>> import sys, locale, os
    >>> sys.getfilesystemencoding()
    'utf-8'
    >>> locale.getpreferredencoding(False)
    'US-ASCII'
    >>> os.popen('cat', 'w').encoding
    'US-ASCII'

    It appears neither of these environment variables does much in my environment.

    I should point out that I just updated to Mac OS X 10.10.2 a couple
    days ago. I have no idea if this problem existed before that upgrade.
    Realizing that perhaps something had changed in the underlying
    operating system support, I rebuilt Python 2.6 through 3.5 from
    scratch. Same result.

    @smontanaro
    Copy link
    Contributor Author

    Peter Otten posted a solution on c.l.py. The issue is that I didn't
    mix my case properly when setting LANG:

    hgpython% LANG=en_US.UTF-8 python3.5 -c 'import locale;
    print(locale.getpreferredencoding(False))'
    UTF-8
    hgpython% LANG=en_US.utf8 python3.5 -c 'import locale;
    print(locale.getpreferredencoding(False))'
    US-ASCII

    @smontanaro
    Copy link
    Contributor Author

    On Sun, Feb 1, 2015 at 2:19 PM, Skip Montanaro <report@bugs.python.org> wrote:

    The issue is that I didn't
    mix my case properly when setting LANG:

    Actually, it's that the hyphen is required in "utf-8" or "UTF-8".

    @smontanaro
    Copy link
    Contributor Author

    Final note here. Peter also did a bit of digging. Here's his note about
    what he found on c.l.py:

    The pager is invoked by os.popen(), and after some digging I find that it
    uses a io.TestIOWrapper() to write the help text. This in turn uses
    locale.getpreferredencoding(False), i. e. you were right to set LANG and
    PYTHONIOENCODING is not relevant.

    I was also able to provoke this problem on an openSuSE 12.2 system with
    3.2.3 installed. In that environment (confirmed by Chris Angelico on his
    Linux system), the case of "utf" didn't matter, nor did it matter if
    "utf-8" was hyphenated or not. Obviously the Mac continues to be a rather
    touchy system w.r.t. locale.

    I don't know if Python should try to be accommodating here, but my
    inclination is "no". OTOH, maybe io.TestIOWrapper should look at
    PYTHONIOENCODING, or the pager should be invoked through something other
    than os.popen (assuming there is a suitable replacement which does pay
    attention to PYTHONIOENCODING).

    @vadmium
    Copy link
    Member

    vadmium commented Feb 1, 2015

    Maybe because a pager sends its bytes more-or-less straight throught from input to output, the PYTHONIOENCODING (sys.stdout.encoding?) should be used for the TextIOWrapper to the pager’s input in this case. I’m not so sure this should be assumed in general though.

    @serhiy-storchaka
    Copy link
    Member

    There are few levels of this issue:

    1. pydoc doesn't escape characters according to output encoding. It escapes characters uneencodable with sys.getfilesystemencoding(), but this encoding can differ from the encoding of sys.stdout or default encoding.

    2. Default encoding for io.TestIOWrapper() and open() can be different from sys.getfilesystemencoding(). And it unexpectedly can be ASCII.

    3. Mac OS doesn't support locales with the utf8 encoding (without hyphen).

    Here is a patch which solves first level -- makes pydoc using appropriate encoding with the backslashreplace error handler.

    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Feb 2, 2015
    @serhiy-storchaka serhiy-storchaka self-assigned this Feb 15, 2015
    @serhiy-storchaka
    Copy link
    Member

    Added a test.

    @vadmium
    Copy link
    Member

    vadmium commented Feb 15, 2015

    Patch looks sensible to me. This is another example of where bpo-15216 would be useful (a standard way to modify the encoding settings of a stream).

    @serhiy-storchaka
    Copy link
    Member

    In the case of this issue pydoc needs change not the encoding of stdout, but errors handler of stdout. There is similar issue with pprint (bpo-19100).

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Feb 20, 2015

    New changeset e7b6b1f57268 by Serhiy Storchaka in branch '3.4':
    Issue bpo-23374: Fixed pydoc failure with non-ASCII files when stdout encoding
    https://hg.python.org/cpython/rev/e7b6b1f57268

    New changeset affe167a45f3 by Serhiy Storchaka in branch 'default':
    Issue bpo-23374: Fixed pydoc failure with non-ASCII files when stdout encoding
    https://hg.python.org/cpython/rev/affe167a45f3

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants