Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pprint output for sets and dicts is not stable #66910

Closed
serhiy-storchaka opened this issue Oct 24, 2014 · 16 comments
Closed

pprint output for sets and dicts is not stable #66910

serhiy-storchaka opened this issue Oct 24, 2014 · 16 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

BPO 22721
Nosy @freddrake, @rhettinger, @amauryfa, @pitrou, @serhiy-storchaka
Files
  • pprint_safe_key.patch
  • pprint_safe_key_alt.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/freddrake'
    closed_at = <Date 2015-04-06.19:54:20.227>
    created_at = <Date 2014-10-24.17:00:49.867>
    labels = ['type-bug', 'library']
    title = 'pprint output for sets and dicts is not stable'
    updated_at = <Date 2015-04-06.19:54:20.226>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2015-04-06.19:54:20.226>
    actor = 'serhiy.storchaka'
    assignee = 'fdrake'
    closed = True
    closed_date = <Date 2015-04-06.19:54:20.227>
    closer = 'serhiy.storchaka'
    components = ['Library (Lib)']
    creation = <Date 2014-10-24.17:00:49.867>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['37011', '37013']
    hgrepos = []
    issue_num = 22721
    keywords = ['patch']
    message_count = 16.0
    messages = ['229943', '229971', '229980', '229993', '229999', '230161', '230162', '230671', '230691', '230696', '232072', '234877', '239313', '240172', '240174', '240175']
    nosy_count = 6.0
    nosy_names = ['fdrake', 'rhettinger', 'amaury.forgeotdarc', 'pitrou', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue22721'
    versions = ['Python 3.5']

    @serhiy-storchaka
    Copy link
    Member Author

    pprint() sorts the content of sets and dicts in order to get stable output which doesn't depend on iteration order of set or dict, which depend not only from values of elements, but also from set or dict history.

    But in some cases the output is different for equal sets or dicts which differs only by their history.

    >>> import pprint
    >>> class A:  # string 'A' < 'int'
    ...     def __lt__(self, other): return False
    ...     def __gt__(self, other): return self != other
    ...     def __le__(self, other): return self == other
    ...     def __ge__(self, other): return True
    ...     def __eq__(self, other): return self is other
    ...     def __ne__(self, other): return self is not other
    ...     def __hash__(self): return 1  # == hash(1)
    ... 
    >>> a = A()
    >>> sorted([1, a])
    [1, <__main__.A object at 0xb700c64c>]
    >>> sorted([a, 1])
    [1, <__main__.A object at 0xb700c64c>]
    >>> # set
    >>> pprint.pprint({1, a})
    {<__main__.A object at 0xb700c64c>, 1}
    >>> pprint.pprint({a, 1})
    {1, <__main__.A object at 0xb700c64c>}
    >>> # dict
    >>> pprint.pprint({1: 1, a: 1})
    {1: 1, <__main__.A object at 0xb700c64c>: 1}
    >>> pprint.pprint({a: 1, 1: 1})
    {<__main__.A object at 0xb700c64c>: 1, 1: 1}

    This is happen because _safe_key's __lt__() calls the __lt__() method of it's left argument, and doesn't use special methods of it's right argument. a.__lt__(1) is successful, but (1).__lt__(a) is failed.

    I think that instead of self.obj.__lt__(other.obj) here should be self.obj < other.obj. Or may be call other.obj.gt(self.obj) if the result of self.obj.lt(other.obj) is NotImplemented.

    _safe_key was introduced in bpo-3976.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Oct 24, 2014
    @pitrou
    Copy link
    Member

    pitrou commented Oct 25, 2014

    Hmm... is it important?

    @freddrake
    Copy link
    Member

    Stability in output order from pprint is very useful in doctests (yes, some people write documentation that they test).

    I think fixing any output stability issues would be very worthwhile.

    @serhiy-storchaka
    Copy link
    Member Author

    Hmm... is it important?

    Not more than sorting pprint output at all. This looks low priority issue to me, but the fix looks pretty easy. Here is a patch. I hope Raymond will make a review, may be I missed some details.

    @serhiy-storchaka
    Copy link
    Member Author

    And here is alternative patch if the first patch is not correct. It is more complicated and I suppose is less efficient in common case.

    @amauryfa
    Copy link
    Member

    What if [some flavor of] pprint sorted items not by value, but by their repr() string?
    It's probably faster than any other algorithm, and guaranteed to produce consistent results.

    Or use this idea only for ambiguous cases?

    @freddrake
    Copy link
    Member

    Sorting by the repr sounds good, but if some dict keys or set members are strings containing single-quotes, the primary sort will be on the type of quote used for the repr, which would be surprising and significantly less useful.

    @rhettinger rhettinger assigned rhettinger and freddrake and unassigned rhettinger Nov 1, 2014
    @rhettinger
    Copy link
    Contributor

    the primary sort will be on the type of quote used for the repr,
    which would be surprising and significantly less useful.

    How about: repr(obj).strip("'\"") ?

    Overall, the idea of using repr() in some fashion is appealing because it sorts on what the user actually sees.

    @serhiy-storchaka
    Copy link
    Member Author

    How about: repr(obj).strip("'\"") ?

    String can starts or ends with quotes. And string repr can be a part of the
    repr of other type (e.g. short list).

    @pitrou
    Copy link
    Member

    pitrou commented Nov 5, 2014

    I think it'd be nice if the solution kept the current order when all keys are orderable (which is a very common case). So IMO repr() should only be used as a fallback when the object comparison fails.

    @serhiy-storchaka
    Copy link
    Member Author

    My question to Raymond is should we use the "<" operator or special methods __lt__ and __gt__ (this is the difference between alternative patches)?

    The use of repr instead of id is different issue.

    @serhiy-storchaka
    Copy link
    Member Author

    Ping.

    1 similar comment
    @serhiy-storchaka
    Copy link
    Member Author

    Ping.

    @freddrake
    Copy link
    Member

    Sorry for the delay. pprint_safe_key.patch looks good to me.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 6, 2015

    New changeset c8815035116b by Serhiy Storchaka in branch 'default':
    Issue bpo-22721: An order of multiline pprint output of set or dict containing
    https://hg.python.org/cpython/rev/c8815035116b

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for your review Fred.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants