Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the use of __doc__ in pydoc #84438

Closed
serhiy-storchaka opened this issue Apr 11, 2020 · 29 comments
Closed

Improve the use of __doc__ in pydoc #84438

serhiy-storchaka opened this issue Apr 11, 2020 · 29 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 40257
Nosy @gvanrossum, @terryjreedy, @mdickinson, @ncoghlan, @ambv, @serhiy-storchaka, @vedgar, @ilevkivskyi, @tacaswell, @Carreau, @eamanu, @tirkarthi
PRs
  • bpo-40257: Output object's own docstring in pydoc #19479
  • bpo-40257: Improve help for the typing module #19546
  • bpo-40257: Tweak docstrings for special generic aliases. #20022
  • bpo-40257: Revert changes to inspect.getdoc() #20073
  • bpo-29940: Add follow_wrapped option to help() #22390
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-04-11.20:54:38.793>
    labels = ['type-feature', 'library', '3.9']
    title = 'Improve the use of __doc__ in pydoc'
    updated_at = <Date 2021-11-04.14:25:10.422>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-11-04.14:25:10.422>
    actor = 'erlendaasland'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-04-11.20:54:38.793>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40257
    keywords = ['patch']
    message_count = 29.0
    messages = ['366220', '366225', '366228', '366230', '366370', '366371', '366376', '366547', '366715', '366716', '368584', '368606', '368607', '368608', '368609', '368611', '368612', '368618', '368632', '368673', '368738', '368790', '368792', '368797', '368798', '368990', '369279', '369290', '369307']
    nosy_count = 12.0
    nosy_names = ['gvanrossum', 'terry.reedy', 'mark.dickinson', 'ncoghlan', 'lukasz.langa', 'serhiy.storchaka', 'veky', 'levkivskyi', 'tcaswell', 'mbussonn', 'eamanu', 'xtreak']
    pr_nums = ['19479', '19546', '20022', '20073', '22390']
    priority = 'high'
    resolution = 'remind'
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue40257'
    versions = ['Python 3.9']

    @serhiy-storchaka
    Copy link
    Member Author

    Currently pydoc outputs __doc__ for classes, functions, methods, properties, etc (using inspect.getdoc()). If the object itself does not have non-empty __doc__, it searches non-empty __doc__ in the class parenthesis (if the object is a class) or in the corresponding overloaded members of the class to which the object (method, property, etc) belongs.

    There are several problems with this.

    1. Using the docstring of a parent class is misleading in most classes, especially if it is a base or abstract class (like object, Exception, Mapping).

    2. If the object does not have the __doc__ attribute, it inherits it from its class, so inspect.getdoc(1) returns the same as inspect.getdoc(int).

    3. If the object has own docstring, but is not a class or function, it will be output in the section DATA without a docstring.

    The following PR fixes these issues.

    1. Docstrings for classes are not inherited. It is better to not output a docstring than output the wrong one.

    2. inspect.getdoc() returns the object's own docstring.

    3. Docstrings are always output for object with a docstring. See for example help(typing).

    In future issues I'll make help(typing) even more informative.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir 3.9 only security fixes type-feature A feature request or enhancement labels Apr 11, 2020
    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Apr 12, 2020

    I don't agree with 1. I use that feature a lot, I write a base class which my students must subclass to their liking, but they still expect that help(TheirClass) will give them the documentation they need.

    I agree that in _some_ cases it is not helpful (but even when the base is abstract, it might be helpful). How about: we keep the current behavior, but make it clear that the docstring applies to a superclass? It might be subtle, as just changing the first line of help() output (currently it says "Help on class Derived in module ...", change it to "Help on class Base in module ..."), or write a longer message such as "Documentation for Derived not found, showing the documentation for Base". But just removing it in all cases is really a wrong thing to do.

    @ilevkivskyi
    Copy link
    Member

    FWIW I like the idea. There are many objects in typing module that are not classes, it would be great to display docs for them.

    @serhiy-storchaka
    Copy link
    Member Author

    Inheritance of docstrings was added in bpo-15582. It works good for class members, but I now realized that doing it for class itself was a mistake. For example:

    >>> import wave
    >>> help(wave.Error)
    Help on class Error in module wave:
    class Error(builtins.Exception)
     |  Common base class for all non-exit exceptions.
     |  
     |  Method resolution order:
     |      Error
     |      builtins.Exception
     |      builtins.BaseException
     |      builtins.object
     |  
    ...

    I fixed many similar issues by adding docstrings to classes, but there are even more exception classes and other classes in the stdlib for which help() gives incorrect description. I don't remember a single case when inheritance of the docstring was helpful.

    Note that help() outputs the list of base class right after the docstring, so it is not hard to give an additional information, especially in interactive browser mode. If you want to inherit a docstring, you can do it explicitly:

        __doc__ = BaseClass.__doc__

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Apr 14, 2020

    Ok, I get what you're saying. But if someone writes

        class B(A):
          # no docstring at all
          ...
    
        help(B)

    they'll still get other elements of current help? Particularly, "Methods inherited from A" (with their docstrings)?

    @serhiy-storchaka
    Copy link
    Member Author

    Yes, of course. And if it overrides some methods, but do not specify doctrings for new methods, they will be inherited from the parent class.

    class A:
        """Base class"""
        def foo(self): """Some docstring"""
        def bar(self): """Other docstring"""
    
    class B(A):
        def foo(self): pass
    
    help(B)

    Help on class B in module __main__:

    class B(A)
     |  Method resolution order:
     |      B
     |      A
     |      builtins.object
     |  
     |  Methods defined here:
     |  
     |  foo(self)
     |      Some docstring
     |  
     |  

    | Methods inherited from A:
    |
    | bar(self)
    | Other docstring
    |
    ...

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented Apr 14, 2020

    Then I'm fine with it. Thanks.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset fbf2786 by Serhiy Storchaka in branch 'master':
    bpo-40257: Output object's own docstring in pydoc (GH-19479)
    fbf2786

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 7e64414 by Serhiy Storchaka in branch 'master':
    bpo-40257: Improve help for the typing module (GH-19546)
    7e64414

    @serhiy-storchaka
    Copy link
    Member Author

    Some work is still needed for HTML output. But this code is so different from the code for plain text output and so complicated that I was afraid to break something. I'll rewrite it in separate issue.

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 2fbc57a by Serhiy Storchaka in branch 'master':
    bpo-40257: Tweak docstrings for special generic aliases. (GH-20022)
    2fbc57a

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 11, 2020

    This is going to potentially break a lot of interactive usage in the Scientific ecosystem.

    A a lot of people are going to do:

        df = load('my.csv')
        df??

    To ask for help and will get nothing.

    Even for subclass, I want to argue that a docstring for a superclass is better than no docstring.

    This will be devastating for many users.

    @gvanrossum
    Copy link
    Member

    Okay, let's reopen.

    @matthias, can you clarify your example? What's load()? And what does df?? do?

    @gvanrossum gvanrossum reopened this May 11, 2020
    @gvanrossum gvanrossum reopened this May 11, 2020
    @tirkarthi
    Copy link
    Member

    https://bugs.python.org/issue40587 has been opened. Copy paste of the report as below :

    In python 3.8:

    >>> class A(object):
    ...     """standard docstring"""
    ...     pass
    ...
    >>> import inspect
    >>> inspect.getdoc(A())
    'standard docstring'
    

    In 3.9:

    $ python
    Python 3.9.0a6+ (heads/master:5b956ca42d, May 10 2020, 20:31:26)
    [Clang 9.0.0 (clang-900.0.39.2)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> class A(object):
    KeyboardInterrupt
    >>> class A(object):
    ...     """standard docstring"""
    ...     pass
    ...
    >>> import inspect
    >>> inspect.getdoc(A())
    >>>
    

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 11, 2020

    can you clarify your example? What's load()? And what does df?? do?

    It was vague on purpose,

    load() would be for example load_csv() from pandas that return a pandas.DataFrame. The point being that users typically won't really know the type of what they will get, they may get a DataFrame, but they may get a subclass if for example they use dask to do distributed computing.

    ? or ?? is the way to get help in IPython/Jupyter, we try to pull as much information as we can and under the hood call inspect.getdoc().

    Assuming

    In [4]: class A:
    ...: "doc"

    In [5]: class B(A):
    ...: pass

    In [6]: b = B()

    Python 3.8 gives:

    In [9]: b?
    Type: B
    String form: <main.B object at 0x104be7d00>
    Docstring: <no docstring>
    Class docstring: doc

    Python 3.9 give

    In [4]: b?
    Type: B
    String form: <main.B object at 0x10a0b7140>
    Docstring: <no docstring>

    We do already pull docs from the superclass of the instance if no doc is found on current object, but now we get even less for the user. We could of course publish patch and walk the hierarchy ourselves, but it will require many users to upgrade (which you of course know they are not good at).

    (Here i'm using ?, ?? try to pull even more informations like the source, known subclasses and other stuff)

    (Will try to get examples with actual code, but I haven't had time to build pandas or other scientific package on 3.9 yet).

    @vedgar
    Copy link
    Mannequin

    vedgar mannequin commented May 11, 2020

    Of course, I thought that

    1. inspect.getdoc() returns the object's own docstring.

    means it returns the object's own docstring _if it has one_. If it doesn't, then it should still return the docstring of its class, of course!

    I have no problem with the fact that help(1) gives the same help as help(int). Of course, same as with the above (subclasses), we might want to emphasize the fact that the help is for the class and not for the object itself, but just returning nothing is in no way an improvement.

    Guido, load is probably from Pandas, df is a relatively standard abbreviation for "dataframe" (an instance of a class DataFrame, with many various methods), and obj?? in Jupyter opens the help for obj in a subwindow, enabling you to browse it and close when you're done with it.

    @gvanrossum
    Copy link
    Member

    Can you all please decide which issue to use?

    @serhiy-storchaka
    Copy link
    Member Author

    help(1) as well as help(int) output the help for int. The only difference is that the former has the first line "Help on int object:", and the latter -- "Help on class int in module builtins:".

    If IPython wants to output the help on the instance, it should change the implementation of ? and ??. It would be better if it correctly attribute the source of the docstring: the object itself, its class or its superclass. It was difficult to distinguish these cases before, now it is easier.

    By the way, I just tried IPython 5.5.0 with Python 3.6.9, and it does not output the docstring either:

    In [1]: a = 1

    In [2]: a??
    Type: int
    String form: 1

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 11, 2020

    Can you all please decide which issue to use?

    We can stay here, I opened the other issue before figuring out this was the cause.

    If IPython wants to output the help on the instance, it should change the implementation of ? and ??. It would be better if it correctly attribute the source of the docstring: the object itself, its class or its superclass. It was difficult to distinguish these cases before, now it is easier.

    Sure I can do that, but this issue feel like a important semantic change of inspect.getdoc(), it may be documented but there is no warning or deprecation of behavior. It is also likely to affect untested code (documentation generation).

    If you decide that this change of behavior is the one you want I'll be happy to support you – I just want to make sure the impact on the rest of the ecosystem. IPython/Jupyter is likely not the only one to rely on inspect.getdoc behavior, I'm thinking pycharm, spyder, sphinx will likely be impacted. I can see inspect.getdoc() in the source of even scipy/numpy and rarely in tests.

    I would prefer new functions with clearer behavior and for example returning a sequence of tuple (docs, where it comes from) potentially deprecating inspect.getdocs() than a change of behavior that remove data where their used to be some

    I just tried IPython 5.5.0

    (You may want to update to 5.10, and do you have reason to still be on 5 and not 7 ?)

    @gvanrossum
    Copy link
    Member

    I'm making this a release blocker -- please everybody come to an agreement or ask on python-dev.

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 12, 2020

    I've sent a request for comments on python-dev

    https://mail.python.org/archives/list/python-dev@python.org/thread/6QO2XI5B7RVZDW3YZV24LYD75VGRITFU/

    Thanks.

    @ambv
    Copy link
    Contributor

    ambv commented May 13, 2020

    Should this block 3.9.0b1, planned for Monday May 18th?

    @gvanrossum
    Copy link
    Member

    I feel it should. At the very least, a decision should be made on how to move forward.

    @ambv
    Copy link
    Contributor

    ambv commented May 13, 2020

    OK, that was my intuition, too. I will block beta on it then.

    @serhiy-storchaka
    Copy link
    Member Author

    I can just copy the implementation of inspect.getdoc() and related functions in pydoc for use in help(), and restore the old code in the inspect module. Of course it will mean that third-party tools will continue to show incorrect docstrings in some cases.

    @terryjreedy
    Copy link
    Member

    Whether or not an object has a docstring is implementation defined, and I do not consider it to be part of its API. I just backported some new docstrings (with Brett Cannon's concurrence), and I would consider it OK for Serhiy to do the same with his addition.

    But the return value of getdoc is half of its API. Its 3.8 doc says
    "If the documentation string for an object is not provided and the object is a class, a method, a property or a descriptor, retrieve the documentation string from the inheritance hierarchy.

    Changed in version 3.5: Documentation strings are now inherited if not overridden."

    While inherited class docstrings are sometimes inapplicable, they may not be. In any case, not doing so is an API change. If done, and this is obviously controversial, the change needs a deprecation period. I would say at least 2 releases. And this should be a separate issue. But I suggest leaving getdoc alone. I think it appropriate that it be a bit liberal in returning text that just might be useful.

    Changing what pydoc does with the information it gets, including from getdoc, is a different issue -- this issue. Pydoc could not call getdoc for classes, or it could determine whether the returned string is inherited and change it slightly as has been suggested.

    Other object information functions can make their own choices. For instance, IDLE calltips are primarily about signature and currently only use an object's own docstring. But maybe pydoc should be used for instance methods to get the one or two summary lines IDLE displays.

    A related note: Useful specific docstrings would be easier if an object could directly extend a base objects docstring with
    f"{base_object.__doc__}\nExtra implementation info\n"
    following the header instead of having to later write
    derived_object.__doc__ = f"....".

    In instructional contexts, this would be useful, in addition for classes, for functions that implement a docstring specificaton.
    def _factor(number):
    "Return prime factors of non-negative ints as list of (prime, count) pairs."
    Students could then submit an implementation with
    def factor(number):
    f"{factor.__doc_}\nImplementation details."
    <implementation code>

    @serhiy-storchaka
    Copy link
    Member Author

    New changeset 08b47c3 by Serhiy Storchaka in branch 'master':
    bpo-40257: Revert changes to inspect.getdoc() (GH-20073)
    08b47c3

    @ambv
    Copy link
    Contributor

    ambv commented May 18, 2020

    Looks like the revert is solving the issue?

    @Carreau
    Copy link
    Mannequin

    Carreau mannequin commented May 18, 2020

    Looks like the revert is solving the issue?

    It appears to do so as far as I can tell, and most test pass on nightly, the rest seem to be unrelated to changes in current 3.9.

    Many thanks to Serhiy for all the work on making documentation better, and there are definitively case where a version of getowndoc, or something that discriminate where the docstring comes from would be useful.

    I also agree that having _some_ ability to extend docstring would be nice but it's likely for another issue.

    @ahmedsayeed1982 ahmedsayeed1982 mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) and removed stdlib Python modules in the Lib dir 3.9 only security fixes labels Nov 4, 2021
    @erlend-aasland erlend-aasland added stdlib Python modules in the Lib dir 3.9 only security fixes and removed interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 4, 2021
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.9 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    7 participants