Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pprint for dataclass instances #87246

Closed
LewisGaul mannequin opened this issue Jan 31, 2021 · 15 comments
Closed

pprint for dataclass instances #87246

LewisGaul mannequin opened this issue Jan 31, 2021 · 15 comments
Assignees
Labels
3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@LewisGaul
Copy link
Mannequin

LewisGaul mannequin commented Jan 31, 2021

BPO 43080
Nosy @rhettinger, @gpshead, @ericvsmith, @ericsnowcurrently, @serhiy-storchaka, @LewisGaul
PRs
  • bpo-43080: pprint for dataclass instances #24389
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ericvsmith'
    closed_at = <Date 2021-04-14.00:03:44.780>
    created_at = <Date 2021-01-31.00:14:38.092>
    labels = ['type-feature', 'library', '3.10']
    title = 'pprint for dataclass instances'
    updated_at = <Date 2021-04-14.00:03:44.779>
    user = 'https://github.com/LewisGaul'

    bugs.python.org fields:

    activity = <Date 2021-04-14.00:03:44.779>
    actor = 'eric.smith'
    assignee = 'eric.smith'
    closed = True
    closed_date = <Date 2021-04-14.00:03:44.780>
    closer = 'eric.smith'
    components = ['Library (Lib)']
    creation = <Date 2021-01-31.00:14:38.092>
    creator = 'LewisGaul'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 43080
    keywords = ['patch']
    message_count = 15.0
    messages = ['386002', '386012', '386014', '386017', '386018', '386020', '386023', '386033', '386034', '388794', '388795', '388828', '389198', '391020', '391021']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'gregory.p.smith', 'eric.smith', 'eric.snow', 'serhiy.storchaka', 'LewisGaul']
    pr_nums = ['24389']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue43080'
    versions = ['Python 3.10']

    @LewisGaul
    Copy link
    Mannequin Author

    LewisGaul mannequin commented Jan 31, 2021

    Currently the pprint module does not have support for dataclasses. I have implemented support for this and will link the PR once I have the issue number!

    See also bpo-37376 for SimpleNamespace support.

    @LewisGaul LewisGaul mannequin added 3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Jan 31, 2021
    @rhettinger
    Copy link
    Contributor

    Since a dataclass can do anything a regular class can do, it is probably too general to have a special case for pprint.

    @LewisGaul
    Copy link
    Mannequin Author

    LewisGaul mannequin commented Jan 31, 2021

    a dataclass can do anything a regular class can do

    Agreed, but isn't that also true of any subclasses of currently supported types? In particular 'UserDict', 'UserList' and 'UserString', which all have explicit support in pprint and are intended for "easier subclassing" according to the docs.

    I'm also not sure why it would be a reason for not giving it pprint handling (in the case where there's no user-defined __repr__). Is there any harm in doing so?

    I'd consider dataclasses one of the primary choices for storing data in modern Python (e.g. for converting to/from JSON/YAML), and may well be used for storing nested data, which can be very hard to read without some mechanism for pretty-printing.

    Indeed, the dataclasses.asdict() function recurses into dataclass fields. This gives the option of pprint(dataclasses.asdict(my_dataclass)), but at the cost of losing the class names and any custom reprs.

    @serhiy-storchaka
    Copy link
    Member

    For all other classes we check that there is no user defined __repr__. But it is difficult to check this for dataclass or namedtuple because there is no base class for dataclasses or namedtuples which provides standard __repr__ implementation. All __repr__ for dataclasses and namedtuples are generated.

    See bpo-7434.

    @ericvsmith
    Copy link
    Member

    Adding an attribute on the __repr__ (and other methods?) signifying that they were generated would let us distinguish them.

    Say, checking getattr(repr, '__is_generated__', False), or similar.

    @LewisGaul
    Copy link
    Mannequin Author

    LewisGaul mannequin commented Jan 31, 2021

    @serhiy - Yes, I noted that problem in the PR. Thanks for pointing me to that issue, I agree it would be good to make pprint properly extensible (my current solution is to maintain a fork of the pprint module with dataclass support added).

    Eric's suggestion would work, I wasn't sure if it would be considered an 'ok' thing to do, but if so then could be an easy enough way to support dataclasses (and namedtuples potentially)?

    @serhiy-storchaka
    Copy link
    Member

    Good idea Eric, it should work.

    But it can make the code of pprint potentially less flexible. Currently it uses a mapping which maps __repr__ to corresponding pprint implementation. Only exception is for dicts, for historical reasons. It potentially can allow to make pprint more general and support arbitrary types by registering some handlers. Since there is no standard implementation of __repr__ for namedtuples and dataclasses we cannot use them as keys, and need to hardcode checks for namedtuple and dataclass (and any other generated classes).

    It is a minor objection. Perhaps practicality should beat purity in this case.

    @rhettinger
    Copy link
    Contributor

    At some point, we need a modern redesign alternative to pprint. It could have its own __pprint__ method to communicate how it wants to be pretty printed.

    Until then, I think the existing pprint module should only grow custom support for classes that have a mostly consistent structure and usage pattern. SimpleNamespace, for example, made sense for a custom pprint handler because it is so dict like and is almost never customized.

    IMO, dataclasses are a bridge too far. Having pprint() guess what a dataclass intends is not far from try to guess what an arbitrary class intends. This is skating on thin ice.

    @ericvsmith
    Copy link
    Member

    I agree that we need a better pprint. I think it would be easier to create something new rather than try and morph the existing pprint, but maybe I lack enough imagination. I'd prefer to use functools.singledispatch instead of a __pprint__ method, but it doesn't really make a lot of difference. PEP-443 (singledispatch) does use pprint as a motivating example.

    I tend to agree with Raymond that we don't want to guess what a dataclass class intends as its usage. After all, we deliberately made it easy to add a custom __repr__ (one not generated by dataclasses.dataclass).

    @ericvsmith
    Copy link
    Member

    I'm leaning toward accepting this on the condition that it only be invoked for dataclasses where __repr__ was the version generated by @DataClass. And also that it use the same fields that the generated __repr__ would use (basically skipping repr=False fields). Under those conditions, I don't see the harm.

    The reason I'm leaning toward acceptance is that we've talked about a better pprint for ages, and yet there's no activity that I can tell toward developing a replacement in the stdlib. pprint was a motivating example for PEP-443 (singledispatch), and that was accepted 8 years ago. I don't think we should have to wait forever to get better pprint for dataclasses.

    But I'm still not 100% decided, and I can be reasoned with!

    @rhettinger
    Copy link
    Contributor

    FWIW, we've not had a feature request for this ever, nor has there been a request for pprint to support attrs, nor any other hand-rolled class that implements methods similar to those generated by dataclasses. AFAICT, this tracker issue wasn't motivated by a known use case; rather, it was "my PR was accepted for SimpleNamespace and thought dataclasses could be the next."

    In the absence of known use cases and user requests, I think we should let it be. Put me down for a -0.

    @LewisGaul
    Copy link
    Mannequin Author

    LewisGaul mannequin commented Mar 16, 2021

    FWIW, we've not had a feature request for this ever, nor has there been a request for pprint to support attrs, nor any other hand-rolled class that implements methods similar to those generated by dataclasses.

    I wouldn't expect core Python to support a 3rd party lib like attrs, but I fail to see what's so different between dataclasses, SimpleNamespace and namedtuple - all of these may be used for storing/modelling [nested] data, which then may be printed.

    AFAICT, this tracker issue wasn't motivated by a known use case; rather, it was "my PR was accepted for SimpleNamespace and thought dataclasses could be the next."

    This issue is entirely motivated by a real-world example - I'm currently maintaining a private fork of the pprint module with support for dataclasses added. I'm assuming the reason this hasn't come up before is that dataclasses are relatively new (and plenty of users will still be on older versions of Python).

    I was not the author of the issue that added support for SimpleNamespace, I just saw it and used it as an example of precedent.

    At some point, we need a modern redesign alternative to pprint.

    I'm on board with this, but as Eric said there aren't currently any signs of this being worked on. In absence of a redesign, dataclass support seems like a natural extension to me.

    @gpshead
    Copy link
    Member

    gpshead commented Mar 20, 2021

    +0.5 I lean towards just accepting this under the conditions Eric describes given that dataclass is a stdlib concept and nobody is likely to claim that such output from pprint is a bad thing.

    The larger "some form of protocol for pprint to work on all sorts of other things" issue (regardless of how) remains a long term wish list item that'll probably wind up in PEP land if someone wants to take it on.

    @ericvsmith
    Copy link
    Member

    New changeset 11159d2 by Lewis Gaul in branch 'master':
    bpo-43080: pprint for dataclass instances (GH-24389)
    11159d2

    @ericvsmith
    Copy link
    Member

    Thanks for all of the work, LewisGaul.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants