classification
Title: unable to document fields of dataclass
Type: enhancement Stage: patch review
Components: Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: eric.smith Nosy List: andrei.avk, dlax, eric.smith, jmg, rhettinger, terry.reedy
Priority: normal Keywords: patch

Created on 2020-11-19 20:36 by jmg, last changed 2021-07-23 20:07 by andrei.avk.

Pull Requests
URL Status Linked Edit
PR 27265 open andrei.avk, 2021-07-20 17:23
PR 27279 open andrei.avk, 2021-07-21 17:59
Messages (17)
msg381455 - (view) Author: John-Mark Gurney (jmg) Date: 2020-11-19 20:36
per: https://bugs.python.org/issue38401

There is not a way to document fields of a dataclass.

I propose that instead of making a language change, that an additional parameter to the field be added in similar vein to property.

This currently works:
```
class Foo:
 def getx(self):
  return 5
 x = property(getx, doc='document the x property')
```

So, I propose this:
```
@dataclass
class Bar:
 x : int = field(doc='document what x is')
```

This should be easy to support as I believe that the above would not require any changes to the core language.
msg381486 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-11-20 15:57
How would you expect to extract this docstring?

I'm not sure how this would work in practice, since both of these are errors:

>>> class A:
...    def __init__(self):
...        self.x = 3
...        self.x.__doc__ = 'foo'
...
>>> A()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
AttributeError: 'int' object attribute '__doc__' is read-only
>>> class B:
...    x: int = 0
...    x.__doc__ = 'foo'
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in B
AttributeError: 'int' object attribute '__doc__' is read-only

It could be stored in the dataclass-specific data attached to a class, but then you'd have to use a dataclass-specific function to get access to it. I'm not sure that's a great improvement.

I also note that attrs doesn't have this feature, probably for the same reason.
msg381492 - (view) Author: John-Mark Gurney (jmg) Date: 2020-11-20 18:18
As I said, I expect it to work similar to how property works:
```
>>> class Foo:
...  def getx(self):
...   return 5
...  x = property(getx, doc='document the x property')
... 
>>> help(Foo)
Help on class Foo in module __main__:

class Foo(builtins.object)
 |  Methods defined here:
 |  
 |  getx(self)
 |  
 |  ----------------------------------------------------------------------
 |  Readonly properties defined here:
 |  
 |  x
 |      document the x property
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
```

The pure python implementation of property is at: https://docs.python.org/3/howto/descriptor.html#properties

which uses the descriptor protocal as documented in: https://docs.python.org/3/howto/descriptor.html

Which uses an object w/ __get__/__set__/__delete__ to emulate attribute access, and that object can have the __doc__ property set on it to provide documentation.
msg381503 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2020-11-20 20:09
@property has a place to attach the docstring, dataclasses in general do not. I wouldn't want to add a descriptor just to have the ability to add a docstring. There are performance issues involved, and I'm sure some corner cases where functionality would change.

Maybe if you bring this up on python-ideas you can get some more ideas.
msg381515 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2020-11-21 01:24
> There is not a way to document fields of a dataclass.

I don't think there is an easy way to do this without a custom descriptor and that would add a lot of overhead.
msg381521 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2020-11-21 03:50
Documenting dataclass fields strikes me as a reasonable request.  Since most most field values cannot have a  .__doc__ attribute...

> It could be stored in the dataclass-specific data attached to a class, 

The obvious place.

> but then you'd have to use a dataclass-specific function to get access to it. 

There are already at least 2 functions for getting docstrings for objects without .__doc__.  They could be enhanced with a dataclass-specific clause.

If ob.__doc__ is None, inspect.getdoc(ob) looks for a docstring elsewhere and sometimes succeeds.  For this reason, I just changed IDLE calltips to use getdoc for calltips.  If dataclasses.is_dataclass(ob.__class__), then I presume getdoc could retrieve the docstring from ob.__class__.__dataclass_fields__.

help(ob) uses pydoc, which gets docstrings with _getowndoc and if needed, _finddoc.  Perhaps these should be replaced with inspect.getdoc, but until then, _finddoc could have an elif clause added for dataclass fields.
msg397889 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-20 17:27
I've put up a simple PoC PR adding a __field_doc__ optional dict attr to dataclass, which would add the docs to class docstring:

@dataclass
class A:
    __field_doc__ = dict(num='number of widgets', total='total widgets')
    total: int
    num: int = 5
print(A.__doc__)

OUTPUT
---

A(total: int, num: int = 5)

num: int [5] -- number of widgets

total: int  -- total widgets
msg397894 - (view) Author: John-Mark Gurney (jmg) Date: 2021-07-20 18:20
Though this suggestion does work, I am not a fan of this solution.

The issue is that it separates the doc from the definition.  This works well if you have only a field fields in the class, But if you get 10-20+ fields, it moves away the docs and makes it easier for people to forget to add documentation to the field when they add additional fields, where w/ my suggestion, it's less likely that it will be forgotten.

Note: I would look at making a patch, but as I don't plan on signing the contributor's agreement, any patch I make will not be usable.
msg397895 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-20 18:25
John-Mark: each fields doc can be added to the dictionary separately
though..
msg397896 - (view) Author: John-Mark Gurney (jmg) Date: 2021-07-20 18:44
So, just looked at the patch, but it's missing the documentation part of it.

Also, yes, you can add the doc as another line, but now that's two lines (yes, you can add semicolons to make it one line, but that might surprise some people).

I do request that any examples USE this approach (multiline) and not the  example that was provided here which will lead the issues.

Note: These are only suggestions, and as with all free software, you're free to take them or leave them.
msg397908 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-20 20:51
John-Mark: yep, it's just a draft patch for now.

It doesn't need to be two lines; it can be updated to start with an empty dict and perhaps use a shorter special name for the doc dict and then you can do:

f1:int = 1
FDOC['f1'] = 'foo'

f2:str = 'a'
FDOC['f2'] = 'bar'

... the nice thing is that you have the flexibility to either group all docs into one place or set them after each field, I can imagine many users will prefer the former.
msg397913 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-07-20 21:44
I think we'd be better off adding a doc parameter to dataclasses.field, and then as @terry.reedy says, modify inspect.getdoc and maybe_finddoc to look in __dataclass_fields__, if it exists.
msg397950 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-21 15:50
Eric: makes sense, I'll go with that.

John-Mark: I will go ahead and work on a PR along the lines Terry suggested, let us know if you have any objections or if you would prefer to work on a PR by yourself instead.

P.S. I understand now what you meant by 1 line vs. 2 lines in my first PR.
msg397955 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-21 18:04
I've added a new PR: https://github.com/python/cpython/pull/27279

(note it's a rough PoC draft at this point)

The output of inspect.getdoc() is the same:

A(total: int, num: int = 5)

total: int  -- number of widgets

num: int [5] -- total widgets
msg398054 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-23 13:31
I haven't modified `_finddoc` in my PR because it currently doesn't show all existing dataclass fields (only those with default set) -- therefore it would make sense to consider this addition if / when complete set of dataclass fields is added to _finddoc. Also as Terry mentioned, help() may end up using inspect.getdoc() in the future.
msg398090 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2021-07-23 19:42
The more I think about this, the less I like it as a dataclasses-specific solution (if it’s accepted at all, that is). Why dataclasses and not attrs or NamedTuple? I suggest bringing it up on python-ideas for wider exposure.
msg398092 - (view) Author: Andrei Kulakov (andrei.avk) * Date: 2021-07-23 20:07
I was thinking about adding this to named tuple as well, but it's less useful than with dataclasses because named tuples are simpler and smaller. However if this feature is added to dataclasses and is widely used, it would make a lot of sense to expand it to namedtuples especially if people request that.

I don't think it makes sense for attributes because many attributes are immutable so you can't add a doc attr to them. Additionally, most attrs can be documented via methods that set or update them.

I agree it's a good idea to bring this to Ideas, I'll think about this for a few days and then do that. Thanks!
History
Date User Action Args
2021-07-23 20:07:05andrei.avksetmessages: + msg398092
2021-07-23 19:42:15eric.smithsetmessages: + msg398090
2021-07-23 13:31:52andrei.avksetmessages: + msg398054
2021-07-22 12:10:08dlaxsetnosy: + dlax
2021-07-21 18:04:34andrei.avksetmessages: + msg397955
2021-07-21 17:59:39andrei.avksetpull_requests: + pull_request25824
2021-07-21 15:50:34andrei.avksetmessages: + msg397950
2021-07-20 21:44:25eric.smithsetmessages: + msg397913
versions: + Python 3.11, - Python 3.10
2021-07-20 20:51:09andrei.avksetmessages: + msg397908
2021-07-20 18:44:46jmgsetmessages: + msg397896
2021-07-20 18:25:18andrei.avksetmessages: + msg397895
2021-07-20 18:20:03jmgsetmessages: + msg397894
2021-07-20 17:27:47andrei.avksetmessages: + msg397889
2021-07-20 17:23:27andrei.avksetkeywords: + patch
nosy: + andrei.avk

pull_requests: + pull_request25809
stage: test needed -> patch review
2020-11-21 03:50:02terry.reedysetnosy: + terry.reedy

messages: + msg381521
stage: test needed
2020-11-21 01:24:37rhettingersetnosy: + rhettinger
messages: + msg381515
2020-11-20 20:09:34eric.smithsetmessages: + msg381503
2020-11-20 18:18:30jmgsetmessages: + msg381492
2020-11-20 15:57:14eric.smithsettype: enhancement
messages: + msg381486
versions: + Python 3.10
2020-11-20 06:59:09eric.smithsetassignee: eric.smith
2020-11-20 02:29:00xtreaksetnosy: + eric.smith
2020-11-19 20:36:18jmgcreate