New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix pydoc crashing on unicode strings #41169
Comments
The pydoc module currently only outputs ASCII and This patch changes pydoc help functions to return For output, all pager functions were changed to encode cgitb.py, DocXMLRPCServer.py and/or |
Logged In: YES This is a too major change so short before the 2.4 release, |
Logged In: YES I'm so sorry this has caused so much trouble. |
Logged In: YES I believe this was fixed. Feel free to re-open is something |
Hello, [this is my first bug report, so I'm sorry if I'm not adhering to some conventions] in what versions of python is this supposed to be fixed? Consider: % python
Python 2.7.2+ (default, Nov 30 2011, 19:22:03)
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from pydoc import pager
>>> from locale import getpreferredencoding
>>> expr = u'\u211a'
>>> pager(expr) # error
>>> pager(expr.encode(getdefaultencoding())) # works The error is: Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pydoc.py", line 1318, in pager
pager(text)
File "/usr/lib/python2.7/pydoc.py", line 1332, in <lambda>
return lambda text: pipepager(text, os.environ['PAGER'])
File "/usr/lib/python2.7/pydoc.py", line 1359, in pipepager
pipe.write(text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u211a' in position 0: ordinal not in range(128) Best, |
It is fixed in Python3. Apparently Raymond was wrong about it having been fixed earlier (or perhaps he was referring to the unicode being removed from the pydoc __credits__ string). |
I see. Thank you. On 12.04.2012 16:08, R. David Murray wrote:
|
Shouldn't this be reopened for Python 2.7 ? |
I don't think so. We aren't promising unicode support in pydoc in 2.x, and it is too late to add it. |
Oh well, in that case I guess we'll have to work around it. Here's the monkey patch I use to overcome this limitation in pydoc, in case others wish to add it to their PYTHONSTARTUP or sitecustomize: def pipepager(text, cmd):
"""Page through text by feeding it to another program."""
try:
import locale
except ImportError:
encoding = "ascii"
else:
encoding = locale.getpreferredencoding()
pipe = os.popen(cmd, 'w')
try:
pipe.write(text.encode(encoding, 'xmlcharrefreplace') if isinstance(text, unicode) else text)
pipe.close()
except IOError:
pass # Ignore broken pipes caused by quitting the pager program.
import pydoc
pydoc.pipepager = pipepager
del pydoc, pipepager |
Hmm. Making it not raise an error while still producing useful output would be acceptable as a bug fix if that's all it takes, I think. |
Here's my patch, along the lines of the work-around I posted earlier. A few remarks:
|
I fail to see how this patch solves this issue. Taking the example from bpo-15791, I still get the traceback of that issue, namely in the line result = result + self.section('AUTHOR', str(object.__author__)) If __author__ is a unicode object, it's the str call that fails. This is long before any attempt is made to render the resulting string to an output device. |
I just ran into this, and I'd like to communicate how unfortunate it is that it's not a priority to fix this fairly trivial (?) bug. It means there's no way to define a unicode string literal with non-ascii characters that won't crash the builtin help() command. I ran into this with the desktop package (http://pypi.python.org/pypi/desktop) where the only useful documentation right now is the source code and the docstrings. Apparently the author, who has non-ascii characters in his name, did me a favor by using broken encoding on the doc string so that at least I could read everything except for his name in the help. I tried to correct the encoding and found I get a nice traceback instead of help. And to top it all off, googling for things like "help unicode docstring" and "python help ascii codec" turns up nothing. I only found this issue once I thought to include "pipepager" in the search... |
Also, the resolution is still marked as "fixed", which is not correct... |
It is not so much that it isn't a priority, as that no one has suggested a working fix that is suitable for 2.7. Do you have a suggestion? |
I guess it must be more complicated than it looks, because I thought checking for unicode strings and doing .encode('utf-8') would help at least some cases without making anything worse. Anyways, if it's too hard or not worth fixing "correctly", couldn't we at least do something to prevent a crash? Maybe strip out / replace special characters and try again? |
Attaching a modified version of bpo-1065986.patch.
|
With this patch applied, the example from bpo-15791 works fine. $ echo "__author__ = u'Michele Orr\xf9'" > foo.py && ./python -c "import foo; print foo.__author__; help(foo)"
Michele Orrù
Help on module foo: NAME FILE DATA AUTHOR |
Updated the previous patch to test unicode strings in __{version,date,author,credits}__ don't crash. |
Now we have a working fix for 2.7. |
Benjamin: the patch looks pretty good to me, for fixing the problem of docstrings that are explicitly unicode. But before I go to the trouble of a full review and test, is this a level of change you think is acceptable in 2.7 at this point it its lifecycle? |
Added <meta charset="utf-8"> to html pydoc generates. |
Okay with me. |
LGTM. One thing: did you mean assertEqual in Lib/test/test_pydoc.py:466: self.assertTrue(open('pipe').read(), pydoc._encode(doc)) |
Good catch. Fixed. |
Made some review comments. Looks good in general and it seems like the tests are fairly comprehensive. I haven't tried to run any additional experiments, but I don't see how it could make things worse, since the new code paths will only do something different if unicode objects are actually involved. |
Made a few more adjustments to fix things r.david.murray pointed out. |
New changeset bf077fc97fdd by R David Murray in branch '2.7': |
Committed, thanks Akira. The support for --disable-unicode is not fully tested. I tried running the tests but the _io module wasn't built, so regrtest doesn't work. A command line invocation of pydoc worked fine, though. |
New changeset e57660acc6d4 by R David Murray in branch '2.7': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: