classification
Title: pprint output for sets and dicts is not stable
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: fdrake Nosy List: amaury.forgeotdarc, fdrake, pitrou, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2014-10-24 17:00 by serhiy.storchaka, last changed 2015-04-06 19:54 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
pprint_safe_key.patch serhiy.storchaka, 2014-10-25 10:45 review
pprint_safe_key_alt.patch serhiy.storchaka, 2014-10-25 13:41 review
Messages (16)
msg229943 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-24 17:00
pprint() sorts the content of sets and dicts in order to get stable output which doesn't depend on iteration order of set or dict, which depend not only from values of elements, but also from set or dict history.

But in some cases the output is different for equal sets or dicts which differs only by their history.

>>> import pprint
>>> class A:  # string 'A' < 'int'
...     def __lt__(self, other): return False
...     def __gt__(self, other): return self != other
...     def __le__(self, other): return self == other
...     def __ge__(self, other): return True
...     def __eq__(self, other): return self is other
...     def __ne__(self, other): return self is not other
...     def __hash__(self): return 1  # == hash(1)
... 
>>> a = A()
>>> sorted([1, a])
[1, <__main__.A object at 0xb700c64c>]
>>> sorted([a, 1])
[1, <__main__.A object at 0xb700c64c>]
>>> # set
>>> pprint.pprint({1, a})
{<__main__.A object at 0xb700c64c>, 1}
>>> pprint.pprint({a, 1})
{1, <__main__.A object at 0xb700c64c>}
>>> # dict
>>> pprint.pprint({1: 1, a: 1})
{1: 1, <__main__.A object at 0xb700c64c>: 1}
>>> pprint.pprint({a: 1, 1: 1})
{<__main__.A object at 0xb700c64c>: 1, 1: 1}

This is happen because _safe_key's __lt__() calls the __lt__() method of it's left argument, and doesn't use special methods of it's right argument. a.__lt__(1) is successful, but (1).__lt__(a) is failed.

I think that instead of `self.obj.__lt__(other.obj)` here should be `self.obj < other.obj`. Or may be call other.obj.__gt__(self.obj) if the result of self.obj.__lt__(other.obj) is NotImplemented.

_safe_key was introduced in issue3976.
msg229971 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-10-25 00:47
Hmm... is it important?
msg229980 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2014-10-25 04:42
Stability in output order from pprint is very useful in doctests (yes, some people write documentation that they test).

I think fixing any output stability issues would be very worthwhile.
msg229993 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-25 10:45
> Hmm... is it important?

Not more than sorting pprint output at all. This looks low priority issue to me, but the fix looks pretty easy. Here is a patch. I hope Raymond will make a review, may be I missed some details.
msg229999 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-10-25 13:41
And here is alternative patch if the first patch is not correct. It is more complicated and I suppose is less efficient in common case.
msg230161 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2014-10-28 17:49
What if [some flavor of] pprint sorted items not by value, but by their repr() string?
It's probably faster than any other algorithm, and guaranteed to produce consistent results.

Or use this idea only for ambiguous cases?
msg230162 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2014-10-28 17:57
Sorting by the repr sounds good, but if some dict keys or set members are strings containing single-quotes, the primary sort will be on the type of quote used for the repr, which would be surprising and significantly less useful.
msg230671 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-11-05 08:55
> the primary sort will be on the type of quote used for the repr,
> which would be surprising and significantly less useful.

How about:  repr(obj).strip("'\"") ?

Overall, the idea of using repr() in some fashion is appealing because it sorts on what the user actually sees.
msg230691 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-05 16:03
> How about:  repr(obj).strip("'\"") ?

String can starts or ends with quotes. And string repr can be a part of the 
repr of other type (e.g. short list).
msg230696 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-11-05 17:05
I think it'd be nice if the solution kept the current order when all keys are orderable (which is a very common case). So IMO repr() should only be used as a fallback when the object comparison fails.
msg232072 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-03 07:36
My question to Raymond is should we use the "<" operator or special methods __lt__ and __gt__ (this is the difference between alternative patches)?

The use of repr instead of id is different issue.
msg234877 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-28 09:20
Ping.
msg239313 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-03-26 07:32
Ping.
msg240172 - (view) Author: Fred L. Drake, Jr. (fdrake) (Python committer) Date: 2015-04-06 19:28
Sorry for the delay.  pprint_safe_key.patch looks good to me.
msg240174 - (view) Author: Roundup Robot (python-dev) Date: 2015-04-06 19:53
New changeset c8815035116b by Serhiy Storchaka in branch 'default':
Issue #22721: An order of multiline pprint output of set or dict containing
https://hg.python.org/cpython/rev/c8815035116b
msg240175 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-06 19:54
Thank you for your review Fred.
History
Date User Action Args
2015-04-06 19:54:20serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg240175

stage: patch review -> resolved
2015-04-06 19:53:08python-devsetnosy: + python-dev
messages: + msg240174
2015-04-06 19:28:00fdrakesetmessages: + msg240172
2015-03-26 07:32:59serhiy.storchakasetmessages: + msg239313
2015-01-28 09:20:29serhiy.storchakasetmessages: + msg234877
2014-12-03 07:36:23serhiy.storchakasetmessages: + msg232072
2014-11-05 17:05:44pitrousetmessages: + msg230696
2014-11-05 16:03:01serhiy.storchakasetmessages: + msg230691
2014-11-05 08:55:46rhettingersetmessages: + msg230671
2014-11-01 08:17:44rhettingersetassignee: rhettinger -> fdrake
2014-11-01 07:11:18rhettingersetassignee: rhettinger
versions: - Python 3.4
2014-10-28 17:57:40fdrakesetmessages: + msg230162
2014-10-28 17:49:03amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg230161
2014-10-25 13:41:01serhiy.storchakasetfiles: + pprint_safe_key_alt.patch

messages: + msg229999
2014-10-25 10:45:22serhiy.storchakasetfiles: + pprint_safe_key.patch
keywords: + patch
messages: + msg229993

stage: patch review
2014-10-25 04:42:51fdrakesetmessages: + msg229980
2014-10-25 00:47:09pitrousetnosy: + pitrou
messages: + msg229971
2014-10-24 17:00:49serhiy.storchakacreate