classification
Title: Improve the use of __doc__ in pydoc
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.9
process
Status: open Resolution: remind
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, levkivskyi, lukasz.langa, mark.dickinson, mbussonn, ncoghlan, serhiy.storchaka, tcaswell, terry.reedy, veky, xtreak
Priority: high Keywords: patch

Created on 2020-04-11 20:54 by serhiy.storchaka, last changed 2020-05-18 22:40 by mbussonn.

Pull Requests
URL Status Linked Edit
PR 19479 merged serhiy.storchaka, 2020-04-11 20:56
PR 19546 merged serhiy.storchaka, 2020-04-15 20:47
PR 20022 merged serhiy.storchaka, 2020-05-10 10:54
PR 20073 merged serhiy.storchaka, 2020-05-13 20:20
Messages (29)
msg366220 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-11 20:54
Currently pydoc outputs __doc__ for classes, functions, methods, properties, etc (using inspect.getdoc()). If the object itself does not have non-empty __doc__, it searches non-empty __doc__ in the class parenthesis (if the object is a class) or in the corresponding overloaded members of the class to which the object (method, property, etc) belongs.

There are several problems with this.

1. Using the docstring of a parent class is misleading in most classes, especially if it is a base or abstract class (like object, Exception, Mapping).

2. If the object does not have the __doc__ attribute, it inherits it from its class, so inspect.getdoc(1) returns the same as inspect.getdoc(int).

3. If the object has own docstring, but is not a class or function, it will be output in the section DATA without a docstring.

The following PR fixes these issues.

1. Docstrings for classes are not inherited. It is better to not output a docstring than output the wrong one.

2. inspect.getdoc() returns the object's own docstring.

3. Docstrings are always output for object with a docstring. See for example help(typing).

In future issues I'll make help(typing) even more informative.
msg366225 - (view) Author: Vedran Čačić (veky) * Date: 2020-04-12 05:16
I don't agree with 1. I use that feature a lot, I write a base class which my students must subclass to their liking, but they still expect that help(TheirClass) will give them the documentation they need.

I agree that in _some_ cases it is not helpful (but even when the base is abstract, it might be helpful). How about: we keep the current behavior, but make it clear that the docstring applies to a superclass? It might be subtle, as just changing the first line of help() output (currently it says "Help on class Derived in module ...", change it to "Help on class Base in module ..."), or write a longer message such as "Documentation for Derived not found, showing the documentation for Base". But just removing it in all cases is really a wrong thing to do.
msg366228 - (view) Author: Ivan Levkivskyi (levkivskyi) * (Python committer) Date: 2020-04-12 08:38
FWIW I like the idea. There are many objects in typing module that are not classes, it would be great to display docs for them.
msg366230 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-12 09:27
Inheritance of docstrings was added in issue15582. It works good for class members, but I now realized that doing it for class itself was a mistake. For example:

>>> import wave
>>> help(wave.Error)
Help on class Error in module wave:

class Error(builtins.Exception)
 |  Common base class for all non-exit exceptions.
 |  
 |  Method resolution order:
 |      Error
 |      builtins.Exception
 |      builtins.BaseException
 |      builtins.object
 |  
...

I fixed many similar issues by adding docstrings to classes, but there are even more exception classes and other classes in the stdlib for which help() gives incorrect description. I don't remember a single case when inheritance of the docstring was helpful.

Note that help() outputs the list of base class right after the docstring, so it is not hard to give an additional information, especially in interactive browser mode. If you want to inherit a docstring, you can do it explicitly:

    __doc__ = BaseClass.__doc__
msg366370 - (view) Author: Vedran Čačić (veky) * Date: 2020-04-14 07:20
Ok, I get what you're saying. But if someone writes

    class B(A):
      # no docstring at all
      ...

    help(B)

they'll still get other elements of current help? Particularly, "Methods inherited from A" (with their docstrings)?
msg366371 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-14 07:45
Yes, of course. And if it overrides some methods, but do not specify doctrings for new methods, they will be inherited from the parent class.

class A:
    """Base class"""
    def foo(self): """Some docstring"""
    def bar(self): """Other docstring"""

class B(A):
    def foo(self): pass

help(B)

Help on class B in module __main__:

class B(A)
 |  Method resolution order:
 |      B
 |      A
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  foo(self)
 |      Some docstring
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from A:
 |  
 |  bar(self)
 |      Other docstring
 |  
...
msg366376 - (view) Author: Vedran Čačić (veky) * Date: 2020-04-14 08:59
Then I'm fine with it. Thanks.
msg366547 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-15 20:00
New changeset fbf2786c4c89430e2067016603078cf3500cfe94 by Serhiy Storchaka in branch 'master':
bpo-40257: Output object's own docstring in pydoc (GH-19479)
https://github.com/python/cpython/commit/fbf2786c4c89430e2067016603078cf3500cfe94
msg366715 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-18 14:13
New changeset 7e64414f57b70dc5bc0ab19a3162a0735f9bfabf by Serhiy Storchaka in branch 'master':
bpo-40257: Improve help for the typing module (GH-19546)
https://github.com/python/cpython/commit/7e64414f57b70dc5bc0ab19a3162a0735f9bfabf
msg366716 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-04-18 14:16
Some work is still needed for HTML output. But this code is so different from the code for plain text output and so complicated that I was afraid to break something. I'll rewrite it in separate issue.
msg368584 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-10 12:14
New changeset 2fbc57af851814df567fb51054cb6f6a399f814a by Serhiy Storchaka in branch 'master':
bpo-40257: Tweak docstrings for special generic aliases. (GH-20022)
https://github.com/python/cpython/commit/2fbc57af851814df567fb51054cb6f6a399f814a
msg368606 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2020-05-11 04:26
This is going to potentially break a lot of interactive usage in the Scientific ecosystem. 

A a lot of people are going to do:

    df = load('my.csv')
    df??

To ask for help and will get nothing. 

Even for subclass, I want to argue that a docstring for a superclass is better than no docstring. 


This will be devastating for many users.
msg368607 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 04:31
Okay, let's reopen.

@Matthias, can you clarify your example? What's load()? And what does df?? do?
msg368608 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2020-05-11 04:35
https://bugs.python.org/issue40587 has been opened. Copy paste of the report as below : 


In python 3.8:

```
>>> class A(object):
...     """standard docstring"""
...     pass
...
>>> import inspect
>>> inspect.getdoc(A())
'standard docstring'
```

In 3.9:

```
$ python
Python 3.9.0a6+ (heads/master:5b956ca42d, May 10 2020, 20:31:26)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> class A(object):
KeyboardInterrupt
>>> class A(object):
...     """standard docstring"""
...     pass
...
>>> import inspect
>>> inspect.getdoc(A())
>>>
```
msg368609 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2020-05-11 04:56
> can you clarify your example? What's load()? And what does df?? do?

It was vague on purpose, 

`load()` would be for example `load_csv()` from `pandas` that return a `pandas.DataFrame`. The point being that users typically won't really know the type of what they will get, they may get a DataFrame, but they may get a subclass if for example they use `dask` to do distributed computing. 

`?` or `??` is the way to get help in IPython/Jupyter, we try to pull as much information as we can and under the hood call `inspect.getdoc()`.

Assuming 

In [4]: class A:
   ...:     "doc"

In [5]: class B(A):
   ...:     pass

In [6]: b = B()

Python 3.8 gives:

In [9]: b?
Type:            B
String form:     <__main__.B object at 0x104be7d00>
Docstring:       <no docstring>
Class docstring: doc

Python 3.9 give

In [4]: b?
Type:        B
String form: <__main__.B object at 0x10a0b7140>
Docstring:   <no docstring>


We do already pull docs from the superclass of the instance if no doc is found on current object, but now we get even less for the user. We could of course publish patch and walk the hierarchy ourselves, but it will require many users to upgrade (which you of course know they are not good at).

(Here i'm using `?`, `??` try to pull even more informations like the source, known subclasses and other stuff)


(Will try to get examples with actual code, but I haven't had time to build pandas or other scientific package on 3.9 yet).
msg368611 - (view) Author: Vedran Čačić (veky) * Date: 2020-05-11 05:00
Of course, I thought that

2. inspect.getdoc() returns the object's own docstring.

means it returns the object's own docstring _if it has one_. If it doesn't, then it should still return the docstring of its class, of course!

I have no problem with the fact that help(1) gives the same help as help(int). Of course, same as with the above (subclasses), we might want to emphasize the fact that the help is for the class and not for the object itself, but just returning nothing is in no way an improvement.

Guido, load is probably from Pandas, df is a relatively standard abbreviation for "dataframe" (an instance of a class DataFrame, with many various methods), and obj?? in Jupyter opens the help for obj in a subwindow, enabling you to browse it and close when you're done with it.
msg368612 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 05:01
Can you all please decide which issue to use?
msg368618 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-11 08:16
help(1) as well as help(int) output the help for int. The only difference is that the former has the first line "Help on int object:", and the latter -- "Help on class int in module builtins:".

If IPython wants to output the help on the instance, it should change the implementation of `?` and `??`. It would be better if it correctly attribute the source of the docstring: the object itself, its class or its superclass. It was difficult to distinguish these cases before, now it is easier.

By the way, I just tried IPython 5.5.0 with Python 3.6.9, and it does not output the docstring either:

In [1]: a = 1

In [2]: a??
Type:        int
String form: 1
msg368632 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2020-05-11 16:24
> Can you all please decide which issue to use?

We can stay here, I opened the other issue before figuring out this was the cause.

> If IPython wants to output the help on the instance, it should change the implementation of `?` and `??`. It would be better if it correctly attribute the source of the docstring: the object itself, its class or its superclass. It was difficult to distinguish these cases before, now it is easier.

Sure I can do that, but this issue feel like a important semantic change of `inspect.getdoc()`, it may be documented but there is no warning or deprecation of behavior. It is also likely to affect untested code (documentation generation).

If you decide that this change of behavior is the one you want I'll be happy to support you – I just want to make sure the impact on the rest of the ecosystem. IPython/Jupyter is likely not the only one to rely on inspect.getdoc behavior, I'm thinking pycharm, spyder, sphinx will likely be impacted. I can see `inspect.getdoc()` in the source of even scipy/numpy and rarely in tests.


I would prefer new functions with clearer behavior and for example returning a sequence of tuple (docs, where it comes from) potentially deprecating inspect.getdocs() than a change of behavior that remove data where their used to be some


>  I just tried IPython 5.5.0

(You may want to update to 5.10, and do you have reason to still be on 5 and not 7 ?)
msg368673 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 22:34
I'm making this a release blocker -- please everybody come to an agreement or ask on python-dev.
msg368738 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2020-05-12 16:53
I've sent a request for comments on python-dev 

https://mail.python.org/archives/list/python-dev@python.org/thread/6QO2XI5B7RVZDW3YZV24LYD75VGRITFU/

Thanks.
msg368790 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-05-13 17:24
Should this block 3.9.0b1, planned for Monday May 18th?
msg368792 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-13 17:50
I feel it should. At the very least, a decision should be made on how to move forward.
msg368797 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-05-13 19:46
OK, that was my intuition, too. I will block beta on it then.
msg368798 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-13 19:56
I can just copy the implementation of inspect.getdoc() and related functions in pydoc for use in help(), and restore the old code in the  inspect module. Of course it will mean that third-party tools will continue to show incorrect docstrings in some cases.
msg368990 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-05-16 00:45
Whether or not an object has a docstring is implementation defined, and I do not consider it to be part of its API.  I just backported some new docstrings (with Brett Cannon's concurrence), and I would consider it OK for Serhiy to do the same with his addition.

But the return value of getdoc is half of its API.  Its 3.8 doc says
"If the documentation string for an object is not provided and the object is a class, a method, a property or a descriptor, retrieve the documentation string from the inheritance hierarchy.

Changed in version 3.5: Documentation strings are now inherited if not overridden."

While inherited class docstrings are sometimes inapplicable, they may not be.  In any case, not doing so is an API change.  If done, and this is obviously controversial, the change needs a deprecation period.  I would say at least 2 releases.  And this should be a separate issue.  But I suggest leaving getdoc alone.  I think it appropriate that it be a bit liberal in returning text that just might be useful.

Changing what pydoc does with the information it gets, including from getdoc, is a different issue -- this issue.  Pydoc could not call getdoc for classes, or it could determine whether the returned string is inherited and change it slightly as has been suggested.

Other object information functions can make their own choices. For instance, IDLE calltips are primarily about signature and currently only use an object's own docstring.  But maybe pydoc should be used for instance methods to get the one or two summary lines IDLE displays.

A related note: Useful specific docstrings would be easier if an object could directly extend a base objects docstring with
  f"{base_object.__doc__}\nExtra implementation info\n"
following the header instead of having to later write
  derived_object.__doc__ = f"....".

In instructional contexts, this would be useful, in addition for classes, for functions that implement a docstring specificaton.
  def _factor(number):
    "Return prime factors of non-negative ints as list of (prime, count) pairs."
Students could then submit an implementation with 
  def factor(number):
    f"{_factor.__doc__}\nImplementation details."
    <implementation code>
msg369279 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-05-18 17:25
New changeset 08b47c367a08f571a986366aa33828d3951fa88d by Serhiy Storchaka in branch 'master':
bpo-40257: Revert changes to inspect.getdoc() (GH-20073)
https://github.com/python/cpython/commit/08b47c367a08f571a986366aa33828d3951fa88d
msg369290 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2020-05-18 21:22
Looks like the revert is solving the issue?
msg369307 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2020-05-18 22:40
> Looks like the revert is solving the issue?

It appears to do so as far as I can tell, and most test pass on nightly, the rest seem to be unrelated to changes in current 3.9. 

Many thanks to Serhiy for all the work on making documentation better, and there are definitively case where a version of getowndoc, or something that discriminate where the docstring comes from would be useful.

I also agree that having _some_ ability to extend docstring would be nice but it's likely for another issue.
History
Date User Action Args
2020-05-18 22:40:41mbussonnsetmessages: + msg369307
2020-05-18 21:22:13lukasz.langasetpriority: release blocker -> high

messages: + msg369290
2020-05-18 17:25:14serhiy.storchakasetmessages: + msg369279
2020-05-18 17:12:38mark.dickinsonsetnosy: + mark.dickinson
2020-05-17 20:16:49tcaswellsetnosy: + tcaswell
2020-05-16 00:45:13terry.reedysetnosy: + terry.reedy
messages: + msg368990
2020-05-13 20:20:47serhiy.storchakasetstage: resolved -> patch review
pull_requests: + pull_request19379
2020-05-13 19:56:58serhiy.storchakasetmessages: + msg368798
2020-05-13 19:46:31lukasz.langasetmessages: + msg368797
2020-05-13 18:30:27yselivanovsetnosy: - yselivanov
2020-05-13 17:50:55gvanrossumsetmessages: + msg368792
2020-05-13 17:24:40lukasz.langasetnosy: + lukasz.langa
messages: + msg368790
2020-05-12 16:53:14mbussonnsetmessages: + msg368738
2020-05-11 22:34:47gvanrossumsetpriority: normal -> release blocker
resolution: fixed -> remind
messages: + msg368673
2020-05-11 16:24:39mbussonnsetmessages: + msg368632
2020-05-11 08:16:37serhiy.storchakasetmessages: + msg368618
2020-05-11 05:01:30gvanrossumsetmessages: + msg368612
2020-05-11 05:00:31vekysetmessages: + msg368611
2020-05-11 04:56:14mbussonnsetmessages: + msg368609
2020-05-11 04:35:54xtreaksetnosy: + xtreak
messages: + msg368608
2020-05-11 04:31:21gvanrossumsetstatus: closed -> open

messages: + msg368607
2020-05-11 04:26:13mbussonnsetnosy: + mbussonn
messages: + msg368606
2020-05-10 12:14:38serhiy.storchakasetmessages: + msg368584
2020-05-10 10:54:25serhiy.storchakasetpull_requests: + pull_request19333
2020-04-18 14:16:43serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg366716

stage: patch review -> resolved
2020-04-18 14:13:25serhiy.storchakasetmessages: + msg366715
2020-04-15 20:47:57serhiy.storchakasetpull_requests: + pull_request18892
2020-04-15 20:00:23serhiy.storchakasetmessages: + msg366547
2020-04-14 08:59:32vekysetmessages: + msg366376
2020-04-14 07:45:59serhiy.storchakasetmessages: + msg366371
2020-04-14 07:20:09vekysetmessages: + msg366370
2020-04-12 09:27:19serhiy.storchakasetmessages: + msg366230
2020-04-12 08:38:21levkivskyisetmessages: + msg366228
2020-04-12 05:16:07vekysetnosy: + veky
messages: + msg366225
2020-04-11 20:56:54serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request18833
2020-04-11 20:54:38serhiy.storchakacreate