Issue 25457: json dump fails for mixed-type keys when sort_keys is specified

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/69643

classification

Title:	json dump fails for mixed-type keys when sort_keys is specified
Type:	behavior	Stage:	patch review
Components:	Library (Lib)	Versions:	Python 3.8, Python 3.7

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Aaron Hall, andrei.avk, jedwards, josh.r, naught101, r.david.murray, tanzer@swing.co.at, zachrahan
Priority:	normal	Keywords:	patch

Created on 2015-10-22 08:01 by tanzer@swing.co.at, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL	Status	Linked	Edit
PR 8011	open	jedwards, 2018-06-29 17:24
PR 15691	closed	python-dev, 2019-09-08 02:36

Messages (17)
msg253324 - (view)	Author: Christian Tanzer (tanzer@swing.co.at)	Date: 2015-10-22 08:01
In Python 3, trying to json-dump a dict with keys of different types fails with a TypeError when sort_keys is specified: python2.7 =========== Python 2.7.10 (default, May 29 2015, 10:02:30) [GCC 4.8.4] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True) '{"null": "nada", "1": 42, "foo": "bar"}' python3.5 ============ Python 3.5.0 (default, Oct 5 2015, 12:03:13) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}, sort_keys = True) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python3.5/json/__init__.py", line 237, in dumps **kw).encode(obj) File "/usr/lib64/python3.5/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib64/python3.5/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) TypeError: unorderable types: str() < int() Note that the documentation explicitly allows keys of different, if basic, types: If skipkeys is True (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError. As all they keys are dumped as strings, a simple solution would be to sort after converting to strings. Looking closely at the output of Python 2, the sort order is a bit strange!
msg253360 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-10-23 02:45
The Python 2 sort order is a result of the "arbitrary but consistent fallback comparison" (omitting details, it's comparing the names of the types), thus the "strange" sort order. Python 3 (justifiably) said that incomparable types should be incomparable rather than silently behaving in non-intuitive ways, hiding errors. Python is being rather generous by allowing non-string keys, because the JSON spec ( http://json.org/ ) only allows the keys ("names" in JSON parlance) to be strings. So you're already a bit in the weeds as far as compliant JSON goes if you have non-string keys. Since mixed type keys lack meaningful sort order, I'm not sure it's wrong to reject attempts to sort them. Converting to string is as arbitrary and full of potential for silently incorrect comparisons as the Python 2 behavior, and reintroducing it seems like a bad idea.
msg253362 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-10-23 02:56
As a workaround (should you absolutely need to sort keys by some arbitrary criteria), you can initialize a collections.OrderedDict from the sorted items of your original dict (using whatever key function you like), then dump without using sort_keys=True. For example, your suggested behavior (treat all keys as str) could be achieved by the user by replacing: json.dumps(mydict, sort_keys=True) with: json.dumps(collections.OrderedDict(sorted(mydict.items(), key=str))) Particularly in 3.5 (where OrderedDict is a C builtin), this shouldn't incur too much additional overhead (`sort_keys` has to make a sorted list intermediate anyway), and the output is the same, without introducing implicit hiding of errors.
msg253363 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2015-10-23 03:01
Oops, minor flaw with that. It's str-ifying the tuples, not the keys, which could (in some cases) cause issues with keys whose reprs have different quoting. So you'd end up with lambdas. Boo. Anyway, corrected version (which would probably not be one-lined in real code): json.dumps(collections.OrderedDict(sorted(mydict.items(), key=lambda x: str(x[0]))))
msg253368 - (view)	Author: Christian Tanzer (tanzer@swing.co.at)	Date: 2015-10-23 08:09
Josh Rosenberg wrote at Fri, 23 Oct 2015 02:45:51 +0000: > The Python 2 sort order is a result of the "arbitrary but consistent > fallback comparison" (omitting details, it's comparing the names of > the types), thus the "strange" sort order. Thanks. I knew that. > Python 3 (justifiably) said that incomparable types should be > incomparable rather than silently behaving in non-intuitive ways, > hiding errors. "justifiably" is debatable. I consider the change ill-conveived. Displaying a dictionary (or just its keys) in a readable, or just reproducible, way is useful in many contexts. Python 3 broke that for very little, INMNSHO, gain. I consider "hiding errors" a myth, to say it politely! > Python is being rather generous by allowing non-string keys, because > the JSON spec ( http://json.org/ ) only allows the keys ("names" in > JSON parlance) to be strings. So you're already a bit in the weeds as > far as compliant JSON goes if you have non-string keys. There are two possibilities: 1) Accepting non-string keys is intended. Then `sort_keys` shouldn't break like it does. As far as JSON goes, the output of `json.dump[s]` contains string keys. 2) Accepting non-string keys is a bug. Then `json.dump[s]` should be changed to not accept them. Mixing both approaches is the worst of all worlds. > Since mixed type keys lack meaningful sort order, I'm not sure it's > wrong to reject attempts to sort them. The documentation says: If sort_keys is True (default False), then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis. Reproducible is the keyword here. Readability is another one. Even if the sort order is "strange", it is much better than random order, if you are looking for a specific key. For the record, it was a test failing under Python 3.5 that triggered this bug report. > > As all they keys are dumped as strings, a simple solution would be to > > sort after converting to strings. > Converting to string is as > arbitrary and full of potential for silently incorrect comparisons as > the Python 2 behavior, and reintroducing it seems like a bad idea. json.dumps already does the conversion:: >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}) '{"foo": "bar", "1": 42, "null": "nada"}' Another run:: >>> json.dumps({1 : 42, "foo" : "bar", None : "nada"}) '{"1": 42, "foo": "bar", "null": "nada"}' That difference is exactly the reason for `sort_keys`.
msg253369 - (view)	Author: Christian Tanzer (tanzer@swing.co.at)	Date: 2015-10-23 08:14
Josh Rosenberg wrote at Fri, 23 Oct 2015 02:56:30 +0000: > As a workaround (should you absolutely need to sort keys by some > arbitrary criteria), you can initialize a collections.OrderedDict from > the sorted items of your original dict (using whatever key function > you like), then dump without using sort_keys=True. Sigh... I already implemented a workaround but it's not as simple as you think — the dictionary in question is nested. The problem is that this is just another unnecessary difficulty when trying to move to Python 3.x.
msg297246 - (view)	Author: (zachrahan)	Date: 2017-06-29 02:50
This one just bit me too. It seems that if JSON serialization accepts non-string dict keys, it should make sure to accept them in all circumstances. Currently, there is an error only with mixed-type dicts, only when sort_keys=True. In addition, the error raised in such cases is especially unhelpful. Running the following: json.dumps({3:1, 'foo':'bar'}, sort_keys=True) produces a stack trace that terminates in a function defined in C, with this error: TypeError: '<' not supported between instances of 'str' and 'int' That error doesn't give non-experts very much to go on...! The fix is reasonably simple: coerce dict keys to strings before trying to sort the keys, not after. The only fuss in making such a patch is that the behavior has to be fixed in both _json.c and in json/encode.py. The only other consistent behavior would be to disallow non-string keys, but that behavior is at this point very well entrenched. So it only makes sense that encoding should be patched to not fail in corner cases.
msg317187 - (view)	Author: Aaron Hall (Aaron Hall) *	Date: 2018-05-20 16:49
Now that dicts are sortable, does that make the sort_keys argument redundant? Should this bug be changed to "won't fix"?
msg317216 - (view)	Author: (zachrahan)	Date: 2018-05-21 07:29
Well, "wontfix" would be appropriate in the context of deprecating the sort_keys option (over the course of however many releases) and documenting that the new procedure for getting JSON output in a specific order is to ensure that the input dict was created in that order. Certainly for regression testing, sort_keys is no longer needed, but that's not the only reason people are using that option. (It's certainly not why I use the option -- my use stems from sort_keys improving human readability of the JSON.) But outside of deprecating sort_keys wholesale, it is still a bug that sort_keys=True can cause an error on input that would otherwise be valid for json.dump[s].
msg317240 - (view)	Author: Christian Tanzer (tanzer@swing.co.at)	Date: 2018-05-21 14:52
Aaron Hall wrote at Sun, 20 May 2018 16:49:06 +0000: > Now that dicts are sortable, does that make the sort_keys argument redundant? > > Should this bug be changed to "won't fix"? https://bugs.python.org/issue25457#msg317216 is as good an answer as I could give. Considering that I openend the bug more than 2.5 years ago, it doesn't really matter though.
msg317241 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-05-21 15:26
I'm fairly certain (though not 100%, obviously :) that coercing first and then sorting would be accepted if someone wants to create a PR for this.
msg317243 - (view)	Author: Aaron Hall (Aaron Hall) *	Date: 2018-05-21 17:26
From a design standpoint, I'm fairly certain the sort_keys argument was created due to Python's dicts being arbitrarily ordered. Coercing to strings before sorting is unsatisfactory because, e.g. numbers sort lexicographically instead of by numeric value when strings. >>> import json >>> json.dumps({i:i2 for i in range(15)}, sort_keys=True) '{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}' >>> json.dumps({str(i):i2 for i in range(15)}, sort_keys=True) '{"0": 0, "1": 1, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81}' Changing the order of operations is just going to create more issues, IMHO. Now that users can sort their dicts prior to providing them to the function, e.g.: >>> json.dumps({str(i):i**2 for i in range(15)}) '{"0": 0, "1": 1, "2": 4, "3": 9, "4": 16, "5": 25, "6": 36, "7": 49, "8": 64, "9": 81, "10": 100, "11": 121, "12": 144, "13": 169, "14": 196}' we could deprecate the argument, or just keep it as-is for hysterical raisins. Regardless, I'd close this as "won't fix".
msg317246 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-05-21 18:21
json keys are strings, so the fact that we support other object types as keys and coerce them to strings is an "extra feature" of python, and is actually a somewhat questionable feature. The reproducible use case is solved by the fact that dicts are now ordered, with no extra work on the part of the programmer. Likewise, if you want custom sorting you can ensure your dict is ordered the way you want it to be, as you indicate. The remaining use case for sort_keys, then (and one for which it is commonly used) is sorting the keys lexicographically, and for that, sorting the coereced strings is correct per the json standard (in which all keys are required to be strings). Note that sort_keys will not be removed for backward compatibility reasons, so the only question is whether or not the increased functionality of coercing first is worth the trouble to implement. I'm actually only +0 on it, since I don't consider it good practice to json-ize dicts that have non-string keys. The reason I'm + is because it would increase backward compatibility with python2 (not the ordering of the output, we can't make that work, but in the fact that it would no longer raise an error in python3). We'll see if other core developers agree or disagree.
msg320706 - (view)	Author: James Edwards (jedwards) *	Date: 2018-06-29 07:08
This came up in a StackOverflow question[1] today, so I took a stab at addressing the error. The changes don't restore the 2.x behavior, but just do as R. David Murray suggested and coerce the keys to strings prior to sorting to prevent the error. The changes in _json.c and json.decoder are handled slightly differently in the case of skipkeys. Both create a list of (coerced_key, value) pairs, sorts it (when specified), and uses that in place of the PyDict_Items / .items(). When skipkeys=True and invalid (uncoercible) keys are found, the c code will just not append that item to the coerced_items list while the python code uses None to signal that item should be filtered out. (That being said, I'm not a huge fan of the approach I used in the Python code and may rewrite using .append instead of a generator. The c code could definitely use a review when it comes to reference counts. Fork commit: https://github.com/jheiv/cpython/commit/8d3612f56a137da0d26b83d00507ff2f11bca9bb [1] https://stackoverflow.com/questions/51093268/why-am-i-getting-typeerror-unorderable-types-str-int-in-this-code
msg384635 - (view)	Author: (naught101)	Date: 2021-01-08 04:58
I want to do something like this: hashlib.md5(json.dumps(d, sort_keys=True)) So I can check if a dict's contents are the same as a DB version, but I can't guarantee that all the keys are strings, so it breaks, annoyingly. I would very much like the apply-default-function-then-sort approach. Until then, my work-around is this: def deep_stringize_dict_keys(item): """Converts all keys to strings in a nested dictionary""" if isinstance(item, dict): return {str(k): deep_stringize_dict_keys(v) for k, v in item.items()} if isinstance(item, list): # This will check only ONE layer deep for nested dictionaries inside lists. # If you need deeper than that, you're probably doing something stupid. if any(isinstance(v, dict) for v in item): return [deep_stringize_dict_keys(v) if isinstance(v, dict) else v for v in item] # I don't care about tuples, since they don't exist in JSON return item Maybe it can be of some use for others.
msg397249 - (view)	Author: Andrei Kulakov (andrei.avk) *	Date: 2021-07-10 15:06
Some observations: - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too much to ask users to do this type of preprocessing manually before dumping to json. - the error doesn't seem too onerous to me. 'unorderable types: str() < int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, and then the issue is obvious. - to me, current behaviour seems preferable to silently guessing that users wants stringified sorting, especially since it can surface as a problem way down the line. - what makes this issue interesting is that in roughly half of cases (I'm guessing) the user will want object sorted and then cast to string and would be surprised if the reverse happened, and in the other half cases the user would want them stringified, then sorted, and would be surprised if that didn't happen. It depends on the perspective: you may think of the json as a representation of a dict of objects, that just happen to be in json format; or you can think of it as a json document with string keys (of course) that just happen to come from a dict of objects. Both can be valid depending on the use case. Given all of this, I propose keeping current behavior for the existing arg, and adding another arg for 'stringify then sort' behavior. Then we'll have no silent guessing and the "unorderable" type error message can point the user directly to the new argument. If the user reads the docs before using this method, they will see two clear options with respective tradeoffs and decide which one to use. So either by reading the docs or running into the error, the user will have a clear explanation and a clear and convenient solution.
msg397250 - (view)	Author: Christian Tanzer (tanzer@swing.co.at)	Date: 2021-07-10 17:34
Json keys are strings. That‘s why json.dump stringifies all keys. If you want to argue that this behavior is wrong I wouldn’t protest except for that it breaks extant code. But arguing that sorting the stringified keys would violate user’s expectations or lead to problems down the line makes no sense. The user is asking for an object with string keys and they want the keys sorted. That is unambiguous and well defined. Neither does adding a second argument make any sense, it would just increase the confusion. My problem was that Python 3.x threw an exception about this for a complex json object in a context where it was not at all obvious what was going on. And the code in question had worked for years in Python 2. This bug report is many, many years old and I don’t much care one way or another but I am very sad that Practicality beats purity got utterly lost in and after the transition to Python 3. Christian Tanzer > On 10.07.2021, at 16:12, Andrei Kulakov <report@bugs.python.org> wrote: > > Andrei Kulakov <andrei.avk@gmail.com> added the comment: > > Some observations: > > - sort_keys arg does a deep sorting of nested dictionaries. It's a bit too much to ask users to do this type of preprocessing manually before dumping to json. > > - the error doesn't seem too onerous to me. 'unorderable types: str() < int()' If uncertain, a user can go to interactive shell and try `1 < "2"`, and then the issue is obvious. > > - to me, current behaviour seems preferable to silently guessing that users wants stringified sorting, especially since it can surface as a problem way down the line. > > - what makes this issue interesting is that in roughly half of cases (I'm guessing) the user will want object sorted and then cast to string and would be surprised if the reverse happened, and in the other half cases the user would want them stringified, then sorted, and would be surprised if that didn't happen. > > It depends on the perspective: you may think of the json as a representation of a dict of objects, that just happen to be in json format; or you can think of it as a json document with string keys (of course) that just happen to come from a dict of objects. Both can be valid depending on the use case. > > Given all of this, I propose keeping current behavior for the existing arg, and adding another arg for 'stringify then sort' behavior. Then we'll have no silent guessing and the "unorderable" type error message can point the user directly to the new argument. > > If the user reads the docs before using this method, they will see two clear options with respective tradeoffs and decide which one to use. > > So either by reading the docs or running into the error, the user will have a clear explanation and a clear and convenient solution. > > ---------- > nosy: +andrei.avk > > _______________________________________ > Python tracker <report@bugs.python.org> > <https://bugs.python.org/issue25457> > _______________________________________

History
Date	User	Action	Args
2022-04-11 14:58:23	admin	set	github: 69643
2021-07-10 17:34:02	tanzer@swing.co.at	set	messages: + msg397250
2021-07-10 15:06:52	andrei.avk	set	nosy: + andrei.avk messages: + msg397249
2021-01-08 04:58:36	naught101	set	nosy: + naught101 messages: + msg384635
2019-09-08 02:36:31	python-dev	set	pull_requests: + pull_request15382
2019-09-06 20:13:55	josh.r	link	issue38046 superseder
2018-06-29 17:24:02	jedwards	set	keywords: + patch stage: patch review pull_requests: + pull_request7617
2018-06-29 07:08:41	jedwards	set	nosy: + jedwards messages: + msg320706
2018-05-21 18:21:42	r.david.murray	set	messages: + msg317246
2018-05-21 17:26:33	Aaron Hall	set	messages: + msg317243
2018-05-21 15:26:37	r.david.murray	set	nosy: + r.david.murray messages: + msg317241 versions: + Python 3.8
2018-05-21 14:52:37	tanzer@swing.co.at	set	messages: + msg317240
2018-05-21 07:29:20	zachrahan	set	messages: + msg317216
2018-05-20 16:49:06	Aaron Hall	set	nosy: + Aaron Hall messages: + msg317187
2017-06-29 02:50:49	zachrahan	set	nosy: + zachrahan messages: + msg297246 versions: + Python 3.7, - Python 3.5
2015-10-23 08:14:22	tanzer@swing.co.at	set	messages: + msg253369
2015-10-23 08:09:53	tanzer@swing.co.at	set	messages: + msg253368
2015-10-23 03:01:57	josh.r	set	messages: + msg253363
2015-10-23 02:56:30	josh.r	set	messages: + msg253362
2015-10-23 02:45:51	josh.r	set	nosy: + josh.r messages: + msg253360
2015-10-22 08:01:01	tanzer@swing.co.at	create