classification
Title: json dump fails for mixed-type keys when sort_keys is specified
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8, Python 3.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Aaron Hall, andrei.avk, jedwards, josh.r, naught101, r.david.murray, tanzer@swing.co.at, zachrahan
Priority: normal Keywords: patch

Created on 2015-10-22 08:01 by tanzer@swing.co.at, last changed 2021-07-10 17:34 by tanzer@swing.co.at.

Pull Requests
URL Status Linked Edit
PR 8011 open jedwards, 2018-06-29 17:24
PR 15691 closed python-dev, 2019-09-08 02:36
Messages (17)
msg253324 - (view) Author: Christian Tanzer (tanzer@swing.co.at) Date: 2015-10-22 08:01
In Python 3, trying to json-dump a dict with keys of different types fails with a TypeError when sort_keys is specified:

python2.7
===========

Python 2.7.10 (default, May 29 2015, 10:02:30) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
'{"null": "nada", "1": 42, "foo": "bar"}'

python3.5
============

Python 3.5.0 (default, Oct  5 2015, 12:03:13) 
[GCC 4.8.5] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
TypeError: unorderable types: str() < int()

Note that the documentation explicitly allows keys of different, if basic, types:

  If skipkeys is True (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.

As all they keys are dumped as strings, a simple solution would be to sort after converting to strings. Looking closely at the output of Python 2, the sort order is a bit strange!
msg253360 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-10-23 02:45
The Python 2 sort order is a result of the "arbitrary but consistent fallback comparison" (omitting details, it's comparing the names of the types), thus the "strange" sort order. Python 3 (justifiably) said that incomparable types should be incomparable rather than silently behaving in non-intuitive ways, hiding errors.

Python is being rather generous by allowing non-string keys, because the  JSON spec ( http://json.org/ ) only allows the keys ("names" in JSON parlance) to be strings. So you're already a bit in the weeds as far as compliant JSON goes if you have non-string keys.

Since mixed type keys lack meaningful sort order, I'm not sure it's wrong to reject attempts to sort them. Converting to string is as arbitrary and full of potential for silently incorrect comparisons as the Python 2 behavior, and reintroducing it seems like a bad idea.
msg253362 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-10-23 02:56
As a workaround (should you absolutely need to sort keys by some arbitrary criteria), you can initialize a collections.OrderedDict from the sorted items of your original dict (using whatever key function you like), then dump without using sort_keys=True. For example, your suggested behavior (treat all keys as str) could be achieved by the user by replacing:

    json.dumps(mydict, sort_keys=True)

with:

    json.dumps(collections.OrderedDict(sorted(mydict.items(), key=str)))

Particularly in 3.5 (where OrderedDict is a C builtin), this shouldn't incur too much additional overhead (`sort_keys` has to make a sorted list intermediate anyway), and the output is the same, without introducing implicit hiding of errors.
msg253363 - (view) Author: Josh Rosenberg (josh.r) * (Python triager) Date: 2015-10-23 03:01
Oops, minor flaw with that. It's str-ifying the tuples, not the keys, which could (in some cases) cause issues with keys whose reprs have different quoting. So you'd end up with lambdas. Boo. Anyway, corrected version (which would probably not be one-lined in real code):

json.dumps(collections.OrderedDict(sorted(mydict.items(), key=lambda x: str(x[0]))))
msg253368 - (view) Author: Christian Tanzer (tanzer@swing.co.at) Date: 2015-10-23 08:09
Josh Rosenberg wrote at Fri, 23 Oct 2015 02:45:51 +0000:

> The Python 2 sort order is a result of the "arbitrary but consistent
> fallback comparison" (omitting details, it's comparing the names of
> the types), thus the "strange" sort order.

Thanks. I knew that.

> Python 3 (justifiably) said that incomparable types should be
> incomparable rather than silently behaving in non-intuitive ways,
> hiding errors.

"justifiably" is debatable. I consider the change ill-conveived.

Displaying a dictionary (or just its keys) in a readable, or just
reproducible, way is useful in many contexts. Python 3 broke that for
very little, INMNSHO, gain.

I consider "hiding errors" a myth, to say it politely!

> Python is being rather generous by allowing non-string keys, because
> the  JSON spec ( http://json.org/ ) only allows the keys ("names" in
> JSON parlance) to be strings. So you're already a bit in the weeds as
> far as compliant JSON goes if you have non-string keys.

There are two possibilities:

1) Accepting non-string keys is intended. Then `sort_keys` shouldn't
   break like it does.

   As far as JSON goes, the output of `json.dump[s]` contains string keys.

2) Accepting non-string keys is a bug. Then `json.dump[s]` should be
   changed to not accept them.

Mixing both approaches is the worst of all worlds.

> Since mixed type keys lack meaningful sort order, I'm not sure it's
> wrong to reject attempts to sort them.

The documentation says:

    If sort_keys is True (default False), then the output of dictionaries
    will be sorted by key; this is useful for regression tests to ensure
    that JSON serializations can be compared on a day-to-day basis.

**Reproducible** is the keyword here.

**Readability** is another one. Even if the sort order is "strange",
it is much better than random order, if you are looking for a specific
key.

For the record, it was a test failing under Python 3.5 that triggered
this bug report.

> > As all they keys are dumped as strings, a simple solution would be to
> > sort after converting to strings.
> Converting to string is as
> arbitrary and full of potential for silently incorrect comparisons as
> the Python 2 behavior, and reintroducing it seems like a bad idea.

json.dumps already does the conversion::

    >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
    '{"foo": "bar", "1": 42, "null": "nada"}'

Another run::

    >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"})
    '{"1": 42, "foo": "bar", "null": "nada"}'

That difference is exactly the reason for `sort_keys`.
msg253369 - (view) Author: Christian Tanzer (tanzer@swing.co.at) Date: 2015-10-23 08:14
Josh Rosenberg wrote at Fri, 23 Oct 2015 02:56:30 +0000:

> As a workaround (should you absolutely need to sort keys by some
> arbitrary criteria), you can initialize a collections.OrderedDict from
> the sorted items of your original dict (using whatever key function
> you like), then dump without using sort_keys=True.

Sigh...

I already implemented a workaround but it's not as simple as you
think — the dictionary in question is nested.

The problem is that this is just another unnecessary difficulty when
trying to move to Python 3.x.
msg297246 - (view) Author: (zachrahan) Date: 2017-06-29 02:50
This one just bit me too. It seems that if JSON serialization accepts non-string dict keys, it should make sure to accept them in all circumstances. Currently, there is an error *only* with mixed-type dicts, *only* when sort_keys=True.

In addition, the error raised in such cases is especially unhelpful. Running the following:
json.dumps({3:1, 'foo':'bar'}, sort_keys=True)

produces a stack trace that terminates in a function defined in C, with this error:
TypeError: '<' not supported between instances of 'str' and 'int'

That error doesn't give non-experts very much to go on...!

The fix is reasonably simple: coerce dict keys to strings *before* trying to sort the keys, not after. The only fuss in making such a patch is that the behavior has to be fixed in both _json.c and in json/encode.py.

The only other consistent behavior would be to disallow non-string keys, but that behavior is at this point very well entrenched. So it only makes sense that encoding should be patched to not fail in corner cases.
msg317187 - (view) Author: Aaron Hall (Aaron Hall) * Date: 2018-05-20 16:49
Now that dicts are sortable, does that make the sort_keys argument redundant?

Should this bug be changed to "won't fix"?
msg317216 - (view) Author: (zachrahan) Date: 2018-05-21 07:29
Well, "wontfix" would be appropriate in the context of deprecating the sort_keys option (over the course of however many releases) and documenting that the new procedure for getting JSON output in a specific order is to ensure that the input dict was created in that order.

Certainly for regression testing, sort_keys is no longer needed, but that's not the only reason people are using that option. (It's certainly not why I use the option -- my use stems from sort_keys improving human readability of the JSON.)

But outside of deprecating sort_keys wholesale, it is still a bug that sort_keys=True can cause an error on input that would otherwise be valid for json.dump[s].
msg317240 - (view) Author: Christian Tanzer (tanzer@swing.co.at) Date: 2018-05-21 14:52
Aaron Hall wrote at Sun, 20 May 2018 16:49:06 +0000:

> Now that dicts are sortable, does that make the sort_keys argument redundant?
>
> Should this bug be changed to "won't fix"?

https://bugs.python.org/issue25457#msg317216 is as good an answer as I
could give.

Considering that I openend the bug more than 2.5 years ago, it doesn't
really matter though.
msg317241 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-21 15:26
I'm fairly certain (though not 100%, obviously :) that coercing first and then sorting would be accepted if someone wants to create a PR for this.
msg317243 - (view) Author: Aaron Hall (Aaron Hall) * Date: 2018-05-21 17:26
From a design standpoint, I'm fairly certain the sort_keys argument was created due to Python's dicts being arbitrarily ordered.

Coercing to strings before sorting is unsatisfactory because, e.g. numbers sort lexicographically instead of by numeric value when strings.

>>> import json
>>> json.dumps({i:i**2 for i in range(15)}, sort_keys=True)
'{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}'
>>> json.dumps({str(i):i**2 for i in range(15)}, sort_keys=True)
'{"0": 0, "1": 1, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81}'

Changing the order of operations is just going to create more issues, IMHO.

Now that users can sort their dicts prior to providing them to the function, e.g.:

>>> json.dumps({str(i):i**2 for i in range(15)})
'{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}'

we could deprecate the argument, or just keep it as-is for hysterical raisins.

Regardless, I'd close this as "won't fix".
msg317246 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-05-21 18:21
json keys *are* strings, so the fact that we support other object types as keys and coerce them to strings is an "extra feature" of python, and is actually a somewhat questionable feature.  The reproducible use case is solved by the fact that dicts are now ordered, with no extra work on the part of the programmer.  Likewise, if you want custom sorting you can ensure your dict is ordered the way you want it to be, as you indicate.  The remaining use case for sort_keys, then (and one for which it is *commonly* used) is sorting the keys lexicographically, and for that, sorting the coereced strings is correct per the json standard (in which all keys are required to be strings).

Note that sort_keys will not be removed for backward compatibility reasons, so the only question is whether or not the increased functionality of coercing first is worth the trouble to implement.  I'm actually only +0 on it, since I don't consider it good practice to json-ize dicts that have non-string keys.  The reason I'm + is because it would increase backward compatibility with python2 (not the ordering of the output, we can't make that work, but in the fact that it would no longer raise an error in python3).

We'll see if other core developers agree or disagree.
msg320706 - (view) Author: James Edwards (jedwards) * Date: 2018-06-29 07:08
This came up in a StackOverflow question[1] today, so I took a stab at addressing the error.  The changes don't restore the 2.x behavior, but just do as R. David Murray suggested and coerce the keys to strings prior to sorting to prevent the error.

The changes in _json.c and json.decoder are handled slightly differently in the case of skipkeys.

Both create a list of (coerced_key, value) pairs, sorts it (when specified), and uses that in place of the PyDict_Items / .items().

When skipkeys=True and invalid (uncoercible) keys are found, the c code will just not append that item to the coerced_items list while the python code uses None to signal that item should be filtered out.

(That being said, I'm not a huge fan of the approach I used in the Python code and may rewrite using .append instead of a generator.

The c code could definitely use a review when it comes to reference counts.

Fork commit: https://github.com/jheiv/cpython/commit/8d3612f56a137da0d26b83d00507ff2f11bca9bb

[1] https://stackoverflow.com/questions/51093268/why-am-i-getting-typeerror-unorderable-types-str-int-in-this-code
msg384635 - (view) Author: (naught101) Date: 2021-01-08 04:58
I want to do something like this:

    hashlib.md5(json.dumps(d, sort_keys=True))

So I can check if a dict's contents are the same as a DB version, but I can't guarantee that all the keys are strings, so it breaks, annoyingly. I would very much like the apply-default-function-then-sort approach. Until then, my work-around is this:

    def deep_stringize_dict_keys(item):
        """Converts all keys to strings in a nested dictionary"""
        if isinstance(item, dict):
            return {str(k): deep_stringize_dict_keys(v) for k, v in item.items()}

        if isinstance(item, list):
            # This will check only ONE layer deep for nested dictionaries inside lists.
            # If you need deeper than that, you're probably doing something stupid.
            if any(isinstance(v, dict) for v in item):
                return [deep_stringize_dict_keys(v) if isinstance(v, dict) else v
                        for v in item]

        # I don't care about tuples, since they don't exist in JSON

        return item

Maybe it can be of some use for others.
msg397249 - (view) Author: Andrei Kulakov (andrei.avk) * (Python triager) Date: 2021-07-10 15:06
Some observations:

 - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too much to ask users to do this type of preprocessing manually before dumping to json.

 - the error doesn't seem too onerous to me. 'unorderable types: str() < int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, and then the issue is obvious.

 - to me, current behaviour seems preferable to silently guessing that users wants stringified sorting, especially since it can surface as a problem way down the line.

 - what makes this issue interesting is that in roughly half of cases (I'm guessing) the user will want object sorted and then cast to string and would be surprised if the reverse happened, and in the other half cases the user would want them stringified, then sorted, and would be surprised if that didn't happen.

It depends on the perspective: you may think of the json as a representation of a dict of objects, that just happen to be in json format; or you can think of it as a json document with string keys (of course) that just happen to come from a dict of objects. Both can be valid depending on the use case.

Given all of this, I propose keeping current behavior for the existing arg, and adding another arg for 'stringify then sort' behavior. Then we'll have no silent guessing and the "unorderable" type error message can point the user directly to the new argument.

If the user reads the docs before using this method, they will see two clear options with respective tradeoffs and decide which one to use.

So either by reading the docs or running into the error, the user will have a clear explanation and a clear and convenient solution.
msg397250 - (view) Author: Christian Tanzer (tanzer@swing.co.at) Date: 2021-07-10 17:34
Json keys *are strings*. 

That‘s why json.dump stringifies all keys. If you want to argue that this behavior is wrong I wouldn’t protest except for that it breaks extant code.

But arguing that sorting the stringified keys would violate user’s expectations or lead to problems down the line makes no sense. The user is asking for an object with string keys and they want the keys sorted. That is unambiguous and well defined.

Neither does adding a second argument make any sense, it would just increase the confusion.

My problem was that Python 3.x threw an exception about this for a complex json object in a context where it was not at all obvious what was going on. And the code in question had worked for years in Python 2.

This bug report is many, many years old and I don’t much care one way or another but I am very sad that 

     Practicality beats purity 

got utterly lost in and after the transition to Python 3.

Christian Tanzer

> On 10.07.2021, at 16:12, Andrei Kulakov <report@bugs.python.org> wrote:
> 
> Andrei Kulakov <andrei.avk@gmail.com> added the comment:
> 
> Some observations:
> 
> - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too much to ask users to do this type of preprocessing manually before dumping to json.
> 
> - the error doesn't seem too onerous to me. 'unorderable types: str() < int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, and then the issue is obvious.
> 
> - to me, current behaviour seems preferable to silently guessing that users wants stringified sorting, especially since it can surface as a problem way down the line.
> 
> - what makes this issue interesting is that in roughly half of cases (I'm guessing) the user will want object sorted and then cast to string and would be surprised if the reverse happened, and in the other half cases the user would want them stringified, then sorted, and would be surprised if that didn't happen.
> 
> It depends on the perspective: you may think of the json as a representation of a dict of objects, that just happen to be in json format; or you can think of it as a json document with string keys (of course) that just happen to come from a dict of objects. Both can be valid depending on the use case.
> 
> Given all of this, I propose keeping current behavior for the existing arg, and adding another arg for 'stringify then sort' behavior. Then we'll have no silent guessing and the "unorderable" type error message can point the user directly to the new argument.
> 
> If the user reads the docs before using this method, they will see two clear options with respective tradeoffs and decide which one to use.
> 
> So either by reading the docs or running into the error, the user will have a clear explanation and a clear and convenient solution.
> 
> ----------
> nosy: +andrei.avk
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue25457>
> _______________________________________
History
Date User Action Args
2021-07-10 17:34:02tanzer@swing.co.atsetmessages: + msg397250
2021-07-10 15:06:52andrei.avksetnosy: + andrei.avk
messages: + msg397249
2021-01-08 04:58:36naught101setnosy: + naught101
messages: + msg384635
2019-09-08 02:36:31python-devsetpull_requests: + pull_request15382
2019-09-06 20:13:55josh.rlinkissue38046 superseder
2018-06-29 17:24:02jedwardssetkeywords: + patch
stage: patch review
pull_requests: + pull_request7617
2018-06-29 07:08:41jedwardssetnosy: + jedwards
messages: + msg320706
2018-05-21 18:21:42r.david.murraysetmessages: + msg317246
2018-05-21 17:26:33Aaron Hallsetmessages: + msg317243
2018-05-21 15:26:37r.david.murraysetnosy: + r.david.murray

messages: + msg317241
versions: + Python 3.8
2018-05-21 14:52:37tanzer@swing.co.atsetmessages: + msg317240
2018-05-21 07:29:20zachrahansetmessages: + msg317216
2018-05-20 16:49:06Aaron Hallsetnosy: + Aaron Hall
messages: + msg317187
2017-06-29 02:50:49zachrahansetnosy: + zachrahan

messages: + msg297246
versions: + Python 3.7, - Python 3.5
2015-10-23 08:14:22tanzer@swing.co.atsetmessages: + msg253369
2015-10-23 08:09:53tanzer@swing.co.atsetmessages: + msg253368
2015-10-23 03:01:57josh.rsetmessages: + msg253363
2015-10-23 02:56:30josh.rsetmessages: + msg253362
2015-10-23 02:45:51josh.rsetnosy: + josh.r
messages: + msg253360
2015-10-22 08:01:01tanzer@swing.co.atcreate