classification
Title: pydoc doesn't find all module doc strings
Type: behavior Stage: test needed
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, akitada, brianvanden, eric.araujo, kjohnson, ncoghlan, ping, sunfinite, vstinner
Priority: normal Keywords: patch

Created on 2005-04-18 12:18 by kjohnson, last changed 2019-07-29 11:24 by vstinner.

Files
File name Uploaded Description Edit
pydoc_fix.diff kjohnson, 2005-04-18 20:04 Revised patch recognizes any triple-quoted string
myfirst_2.patch sunfinite, 2013-09-25 07:30 Second attempt at a patch review
pydoc_2.7.patch sunfinite, 2013-09-26 12:12 Patch for 2.7 review
Messages (16)
msg25049 - (view) Author: Kent Johnson (kjohnson) * Date: 2005-04-18 12:18
pydoc.synopsis() attempts to find a module's doc string
by parsing the module text. But the parser only
recognizes strings created with """ and r""". Any other
docstring is ignored.

I've attached a patch against Python 2.4.1 that fixes
pydoc to recognize ''' and r''' strings but really it
should recognize any allowable string format.
msg25050 - (view) Author: Ka-Ping Yee (ping) * (Python committer) Date: 2005-04-18 18:23
Logged In: YES 
user_id=45338

PEP 257 recommends: "For consistency, always use """triple
double quotes""" around docstrings."  I think that's why
this was originally written to only look for triple
double-quotes.

Are there a large number of modules written using
triple-single quotes for the module docstring?
msg25051 - (view) Author: Kent Johnson (kjohnson) * Date: 2005-04-18 20:04
Logged In: YES 
user_id=49695

I don't know if there are a large number of modules with
triple-single-quoted docstrings. Pydoc will search any
module in site-packages at least, so you have to consider
third-party modules.

At best pydoc is inconsistent - the web browser display uses
the __doc__attribute but search and apropos use synopsis().
It's a pretty simple change to recognize any triple-quoted
string, it seems like a good idea to me...

I have attached a revised patch that uses a regex match so
it works with e.g. uR""" and other variations of triple-quoting.

FWIW this bug report was motivated by this thread on
comp.lang.python:
http://groups-beta.google.com/group/comp.lang.python/browse_frm/thread/e5cfccb7c9a168d7/1c1702e71e1939b0?q=triple&rnum=1#1c1702e71e1939b0
msg25052 - (view) Author: Ka-Ping Yee (ping) * (Python committer) Date: 2005-04-18 20:28
Logged In: YES 
user_id=45338

I think you're right that if it works for the module summary
(using __doc__) then it should work with synopsis(). 
However, the patch you've added doesn't address the problem
properly; instead of handling """ correctly and ignoring
''', it handles both kinds of docstrings incorrectly because
it will accept ''' as a match for """ or """ as a match for '''.

I'll look at fixing this soon, but feel free to keep
prodding me until it gets fixed.
msg25053 - (view) Author: Brian vdB (brianvanden) Date: 2005-04-19 17:11
Logged In: YES 
user_id=1015686

I started the thread to which Kent referred. I am aware of
PEP 257's recommendation of triple-double quotes. My
(perhaps wrong-headed) construal of that PEP is that it
isn't sufficiently rule-giving that I would have expected
other tools to reject triple-single quotes. 
At any rate, since triple-single are syntactically
acceptable, it would seem better if they were accepted on
equal footing with triple-double. I can well understand that
this would be a v. low priority issue, though. Call it a
RFE. :-)
msg82175 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-02-15 22:16
Source still has the snippet in patch (didn't test behavior).
msg169274 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-28 12:03
The standard library has moved on quite a bit since this patch was written...

1. source_synopsis() should be using the tokeniser module when reading the docstring. The current implementation is broken in more ways than just those noted here (e.g. it completely ignores the declared encoding)

(The reason for not using full compilation is that you would then have to either *run* the compiled code or else compile to the AST and interrogate that, which is technically implementation dependent)

2. For 3.3+, synopsis should be using importlib to get the source code rather than assuming filesystem imports. That's probably better handled in a separate issue, though.
msg169275 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-28 12:10
Oops, I somehow ended up looking at an old revision of pydoc.py

The current version *is* using tokenize.open and importlib in synopsis(), so those aspects of my comments are incorrect.

However, the point that pydoc should probably be using the tokenize module to do the parsing inside source_synopsis remains valid. There's no good reason to continue duplicating a subset of that text processing logic within pydoc.
msg198292 - (view) Author: Sunny K (sunfinite) * Date: 2013-09-22 17:29
I've rewritten the source_synopsis function to use the tokenize module. 

It should now work with triple single quotes and hopefully all the other cases where __doc__ returns a string.

Since tokenize.tokenize needs a file object that is opened in binary mode, in the case of a StringIO object, i am reading the whole  object and converting it to a BytesIO object. I don't know if that is the right way. Also, the only instance i could find where source_synopsis is called with a StringIO object is in the ModuleScanner.run method. Maybe we could tweak this call to pass a byte-stream object to avoid the overhead of re-conversion?

All the current tests pass.
msg198318 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-09-23 08:06
+        except:
+            pass
...
+    except TypeError:
+        return None

I don't understand these try/except. First, "except: pass" must never be used, only catch specific exceptions (ex: AttributeError). Can you explain why you expect a TypeError?

If your patch fixes a bug, you must add a new unit test to test_pydoc to check for non-regression.
msg198377 - (view) Author: Sunny K (sunfinite) * Date: 2013-09-25 07:30
I've updated my patch with the review changes and tests.

tokenize.detect_encoding throws a TypeError if the file object passed to it is in text mode. However, i've realized catching this is not necessary as i now check for TextIOBase instead of just StringIO before.
msg198426 - (view) Author: Akira Kitada (akitada) * Date: 2013-09-26 00:01
Do you have any plan to work on patch for 2.7?
Apparently your patch is only for 3.x.
msg198439 - (view) Author: Sunny K (sunfinite) * Date: 2013-09-26 12:12
Added patch for 2.7. Please review.
msg208014 - (view) Author: Akira Kitada (akitada) * Date: 2014-01-13 09:20
I tried pydoc_2.7.patch with the following test file and
found source_synopsis returns \x escaped string instead of \u escaped one.


# -*- coding: utf-8 -*-

u"""ツ"""

class Spam(object):
    u"""ツ"""


>>> import utf8
>>> utf8.__doc__
u'\u30c4'
>>> print(utf8.__doc__)
ツ
>>> import pydoc
>>> pydoc.source_synopsis(file('utf8.py'))
u'\xe3\x83\x84'
>>> print pydoc.source_synopsis(file('utf8.py'))
ツ
>>> print pydoc.source_synopsis(file('utf8.py')).encode('latin-1')
ツ
msg219383 - (view) Author: Sunny K (sunfinite) * Date: 2014-05-30 10:24
Hi Victor, can you give this another look?
msg348607 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-07-29 11:24
This issue is 14 years old, inactive for 5 years, has 3 patches: it's far from being "newcomer friendly", I remove the "Easy" label.
History
Date User Action Args
2019-07-29 11:24:34vstinnersetkeywords: - easy

messages: + msg348607
versions: + Python 3.9, - Python 3.5
2014-05-30 10:24:40sunfinitesetmessages: + msg219383
versions: + Python 3.5, - Python 2.7, Python 3.2, Python 3.3
2014-05-30 10:22:10sunfinitesetfiles: - myfirst.patch
2014-01-13 09:20:09akitadasetmessages: + msg208014
2013-09-26 12:12:55sunfinitesetfiles: + pydoc_2.7.patch

messages: + msg198439
2013-09-26 00:01:03akitadasetmessages: + msg198426
2013-09-25 08:42:36mpgsetnosy: - mpg
2013-09-25 07:30:08sunfinitesetfiles: + myfirst_2.patch

messages: + msg198377
2013-09-23 08:06:50vstinnersetnosy: + vstinner
messages: + msg198318
2013-09-22 17:29:45sunfinitesetfiles: + myfirst.patch

nosy: + sunfinite
messages: + msg198292

keywords: + patch
2013-09-21 07:20:12akitadasetnosy: + akitada
2012-10-01 07:15:31mpgsetnosy: + mpg
2012-09-06 23:14:13mikehoysetnosy: - mikehoy
2012-08-28 12:10:36ncoghlansetmessages: + msg169275
2012-08-28 12:03:53ncoghlansetversions: + Python 3.3, - Python 3.1
nosy: + ncoghlan

messages: + msg169274

assignee: ping ->
2012-08-28 09:33:25mikehoysetnosy: + mikehoy
2010-08-21 13:50:23BreamoreBoysetversions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2010-06-02 08:40:03eric.araujosetnosy: + eric.araujo
2009-04-22 14:44:39ajaksu2setkeywords: + easy
2009-02-15 22:16:29ajaksu2setnosy: + ajaksu2
stage: test needed
type: behavior
messages: + msg82175
versions: + Python 2.6, - Python 2.4
2005-04-18 12:18:39kjohnsoncreate